When crawling in spider mode, all URLs within the crawl scope will be included in the crawl by default.

If there are areas of the site that you prefer not to include in the crawl (such as subdomains, subfolders, parameters), this can be done by using URL Exclusions.

Some reasons why URL Exclusions may be helpful:

  • Focus on only the most important areas of the site: Especially for very large sites, it may not be beneficial to crawl every URL on the site. Excluding certain parts of the site can help focus the crawler on the URLs that provide the most value.

  • Prevent crawling of duplicate or machine-generated content: If there are many pages with similar content that can be reached by multiple URLs, it may be a good idea to exclude these additional variations.

  • Conserve crawl limits: Dragon Metrics limits the number of URLs that can be crawled in a site. Using exclusions can help ensure that a site does not use more crawl credits than desired.

  • Faster crawls: If fewer URLs are included in the crawl, the crawl can be completed faster than if exclusions were not in place.

How to exclude URLs

There are 3 places where URL Exclusions may be updated:

Initial campaign setup

When creating a new campaign, click Advanced setup to access crawler options.

Crawler Campaign Settings

In Crawler Settings under Campaign Settings in the left navigation

Site Auditor

In Site Auditor, click Crawl options in the upper-right of the page

Whichever place you access it, click the Manage button under Excluded URLs

This will bring up the Excluded URLs modal. Click the + button to add a new exclusion.

By default, 3 options are available for excluding parts of the site:

  • Subdirectory: Exclude all URLs in a subfolder on any subdomain

  • Subdomain: Exclude all URLs on the specified subdomain

  • URL: Exclude only one single URL

To reveal even more methods for excluding URLs, click Show Advanced Options

4 additional options will be revealed:

  • URLs containing a specific parameter: Any URL with the specified parameter will not be included in the crawl

  • URLs containing a specific parameter equal to a specific value: Any URLs with the specified parameter AND specified value will be excluded

  • URLs containing any parameters: Any URLs that contain one or more parameters of any kind or value will be excluded

  • URLs that match a regular expression: For complicated logic rules, use a regular expression to test whether a URL should be excluded or not

Click Save when you are finished.

Your exclusion has now been added.

The changes will take effect next crawl, but you can always initiate a manual crawl immediately.

Only one rule may be added at a time, but there is no limit to the number of exclusions you can add to a campaign. You can continue adding more rules in the same way.

Testing exclusions

You can always test whether you have set up exclusions as you expected by entering a URL into the box.

If it will be included in the crawl, a green check will appear

If it will be excluded, a red X will be shown.

Did this answer your question?