Skip to main content
All CollectionsSite AuditTutorials
How to set Site Audit to crawl only pages within a sitemap
How to set Site Audit to crawl only pages within a sitemap

This article explains how to set Site Audit to crawl pages existing in a sitemap exclusively (for new and existing projects)

Matt avatar
Written by Matt
Updated over 3 years ago

For new projects

Step 1

Create a new project and fill in the data in the Scope and Ownership sections (if you want to learn more about creating a new project, please read this article).

Once in the Site Audit section, go to the URL Sources tab and check only the Specific sitemaps option.

In the text field below, input the URL of your sitemap (you can also include multiple URLs for multiple sitemaps).

Check only the Specific sitemaps option, enter sitemap url.

This will tell our crawler to start the crawl from your sitemap.

Before clicking Continue, please make sure that every option other than Specific sitemaps remains unchecked.

Step 2

Go to the next tab in Site Audit, the Crawl settings.

Find the Max depth level from seed option and set it to 0.

Find the Max depth level from seed option and set it to 0.

This will tell our crawler not to go further than URLs found in the sitemap.

Step 3

Now you can click Continue to finish the Site Audit section of setting up a new project.

Fill in the rest of the required information in the next sections of the project creation wizard and wait for your crawl to complete.

Upon completion of the crawl, in the Crawl log you may see that the number of known URLs is higher than the number of crawled URLs - this is normal. Crawled URLs are the ones that were inside the scope of your project, while all known URLs include also discarded URLs (the latter metric will be higher in most cases).

The number of known URLs can be higher than the number of crawled URLs.

To make sure only pages included in your sitemap were crawled, you can go to the Page explorer section of Site Audit and set the filter to Is in sitemap = Yes. The resulting page count should be equal to the number of crawled pages.

Go to the Page explorer section of Site Audit and set the filter to Is in sitemap = Yes.

For existing projects

Please note that changing the scope of the crawl inside an existing project can greatly influence the Site Audit metrics. If you want to have just the fresh data, you can delete a project and create one with new settings (deleting a project will also delete tracked keywords in Rank Tracker and delete any existing alerts).

To change the Site Audit settings in an existing project:

Step 1

From the Site Audit dashboard click on the vertical ellipsis next to your project and click Settings. This will take you to the Site Audit settings.

Click Settings to access Site Audit settings.

Once in Settings, click Site Audit in the right panel. Here, the steps to include only sitemaps in the next crawl are the same as with new projects as described above.

Click on Site Audit to set up your URL sources.

Step 2

To make your changes take effect, Ahrefs needs to run a new crawl with the new settings. Click on your project in Site Audit and then click on the New crawl button. Please wait until our bot crawls your site. You will be able to see the results in the Crawl log.

Run a new crawl in Site Audit

Related:

Did this answer your question?