Dragon Metrics crawler offers a high degree of customization so you can crawl the site exactly to your needs. These settings can be changed in Crawl Settings.

To get started, navigate to Campaign Settings > Crawler in the bottom left navigation.

Crawl Schedule

The date of the most recent crawl and the next scheduled crawl are shown at the top of the page. By default, crawls are scheduled every 15 days, but can be recrawled again at any time by clicking Recrawl site now.

To request a different crawling schedule, please contact us.

Enable or disable crawls

Crawls can be enabled or disabled at any time. If crawling has been disabled for a campaign, past crawl data will still be available, but no new automatic crawls will be scheduled.

Crawl Limits (Spider mode)

Crawl limits determine the maximum number of URLs crawled for a campaign.

The maximum value allowed for crawl limits is defined by the lower of either:

  • The number of remaining crawl credits in your account

  • The maximum URLs crawled per campaign.

Both of these numbers are determined by your subscription plan.

Learn more about crawl limits

Crawl Images

If enabled, we’ll crawl images on this site to check for issues such as broken images or missing alt tags. Each image crawled will use 0.25 crawl credits.

Render Javascript

By default, all JavaScript on the page is ignored. If enabled, our crawler will render JavaScript, similar to how a browser would. This is useful for crawling content that is only available after JavaScript is executed. Each URL crawled with JavaScript will use 5 crawl credits.

Learn more about JavaScript crawling

Crawl mode

There are two modes available in Crawler Settings: Spider and List.

Spider mode is the default method, which will start crawling the home page of the site and recursively crawl all links found on this each page. Learn more about spider mode

List mode allows you upload a list of URLs to be crawled. Only pages included in this list will be crawled. Learn more about list mode

Learn more about crawl mode

Crawl Scope (Spider mode only)

The scope of a crawl determines which URLs are considered part of the site. For example, the scope can be set to

  • Track all URLs on a root domain: Include only the root domain (example.com)

  • Track all URLs on a single subdomain: Include the subdomain + root domain (www.example.com)

  • Track all URLs in a subfolder on any subdomain: Include the root domain + path (example.com/en/uk/)

  • Track only URLs in a single subdomain on a single subfolder: Include the subdomain + root domain + path (www.example.com/products)

  • Track a single URL: Include only the URL you want to track (www.example.com/products/19312981.html)

The scope is determined by the site field in General Campaign Settings the first time a campaign is created. By default, rank tracking, site audits, and backlink reports all share this same scope.

However, there may be times when you may want to use a different scope for site audits than for rank tracking or backlink reports. For example, you may want to include URLs from all subdomains in rank tracking (root domain scope) but only crawl the www subdomain (subdomain scope) for site audits.

To do this, you can set a separate scope for crawls.

Learn more about crawl scope

Manage Exclusions (Spider mode only)

When crawling in spider mode, all URLs within in the crawl scope will be included in the crawl by default.

If there are areas of the site that you prefer not to include in the crawl (such as subdomains, subfolders, parameters), this can be done by using URL Exclusions.

Learn more about URL Exclusions

Alternate Seed URLs (Spider mode only)

When crawling in Spider mode, our crawler will begin by first crawling the website URL set for this campaign. It will follow any redirects and recursively crawl all links found on each page.

There may be times when you would like to change which URL the crawler begins from. You can do this by entering one or more Alternate Seed URLs. These will be treated as additional starting points for the crawler. All Alternate Seed URLs will be treated just like the website URL for this campaign, each with a crawl depth of 0.

Learn more about alternate seed URLs

Core Web Vitals URLs

By default, Dragon Metrics automatically chooses the top URLs based on organic sessions, clicks, page authority, and crawl depth to audit for Core Web Vitals performance. If you prefer to choose the URLs yourself, you may add them here.

Crawl Speed

By default, our crawler (Dragonbot) makes requests at a moderate and respectful rate of 1.3 URLs / second.

If you have a large site and a powerful server, you may want to crawl at a faster rate to finish the crawl in shorter amount of time. If you have a more sensitive server that may return HTTP 5xx errors for crawlers that make requests faster than a certain rate, you may want to adjust the crawl rate slower to avoid these blocked requests.

Choosing a speed higher than 5 URLs / second will require verification that you own the site. Please use caution when choosing high speeds, so not to overload your server.

Learn more about crawl speed

User Agent String

Some websites or servers may respond differently depending on the client's user agent string. By default, our crawler identifies itself as Dragonbot and mimics a desktop device, but it's also possible to choose mobile instead, or use a completely different user agent string as well.

Choose Custom to enter any user agent string of your choosing. Note: Use caution with this option, as it may cause some websites to not work properly.

Site verification is required for any value other than Dragonbot desktop or Dragonbot mobile.

Override robots.txt

Dragon Metrics crawler (Dragonbot) will respect robots.txt directives by default. However, there may be situations when you may want to ignore robots.txt directives or use a custom robots.txt instead. This may be particularly helpful when crawling a site in a testing or staging environment.

Dragon Metrics allows you override the site's robots.txt in this way.

To completely ignore robots.txt directives, choose Ignore robots.txt

To use a custom robots.txt instead, choose this option and add the text for your custom file in the text box below it.

To enable either setting, Site verification is required.

Basic Authentication

There may be times when you'd like to crawl a site in a test or staging environment before it goes live. There could also be times when you need to crawl a site that is not publicly available for other reasons.

If the site uses HTTP Basic Authentication, Dragon Metrics can crawl the site even if it's password-protected.

To enable Basic Authentication, change the value to Yes, and enter the username and password for this site.

This setting requires verification that you own or manage this site.

Crawl multiple campaigns with same domain in parallel

There may be times when multiple campaigns that all share the same root domain may be scheduled to be crawled at the same time.

If these crawls were to happen simultaneously, each campaign's crawl speed will stay within the maximum crawl rate setting, but since they're all for the same domain, the site could be overloaded with requests much higher than the expected speed.

For this reason, Dragonbot will only crawl a maximum of 1 campaign with the same root domain at one time. This will ensure that sites will not be overloaded with a crawl rate faster than expected, but can sometimes lead to crawls being finished slowly.

To speed up crawls, you can enable parallel crawling, which will remove this limit and allow multiple campaigns of the same root domain to be crawled simultaneously.

Site verification is required to enable this setting. Please use caution when choosing to enable this setting, so not to overload your server.

Did this answer your question?