Dragon Metrics will automatically crawl the website of each campaign every 14 days without requiring any additional configuration. However, there are number of settings that can be configured to ensure your site is crawled optimally.

Crawl settings may be accessed in 3 areas:

Initial campaign setup

When creating a new campaign, click Show Advanced Options under Crawl Settings to access crawler options.

Crawler Campaign Settings

In Crawler Settings under Campaign Settings in the left navigation

Site Auditor

In Site Auditor, click Crawl options in the upper-right of the page

The following settings can be configured for a site crawl:

Enable or disable crawls

Crawls can be enabled or disabled at any time. If crawling has been disabled for a campaign, past crawl data will still be available, but no new automatic crawls will be scheduled.

Crawl mode

There are two modes available in Crawler Settings: Spider and List.

Spider mode is the default method, which will start crawling the home page of the site and recursively crawl all links found on this each page. Learn more about spider mode

List mode allows you upload a list of URLs to be crawled. Only pages included in this list will be crawled. Learn more about list mode

Learn more about crawl mode

Crawl Limits (Spider mode)

Crawl limits determine the maximum number of URLs crawled for a campaign.

The maximum value allowed for crawl limits is defined by the lower of either:

  • The number of remaining crawl credits in your account

  • The maximum URLs crawled per campaign.

Both of these numbers are determined by your subscription plan.

Learn more about crawl limits

Crawl Speed

By default, our crawler (Dragonbot) makes requests at a moderate and respectful rate.

If you have a large site and a powerful server, you may want to crawl at a faster rate to finish the crawl in shorter amount of time. If you have a more sensitive server that may return HTTP 5xx errors for crawlers that make requests faster than a certain rate, you may want to adjust the crawl rate slower to avoid these blocked requests.

Learn more about crawl speed

Crawl Images

If enabled, we’ll crawl images on this site to check for issues such as broken images or missing alt tags. Each image crawled will use 0.25 crawl credits.

Render Javascript

By default, all JavaScript on the page is ignored. If enabled, our crawler will render JavaScript, similar to how a browser would. This is useful for crawling content that is only available after JavaScript is executed. Each URL crawled with JavaScript will use 5 crawl credits.

Learn more about JavaScript crawling

Crawl Scope (Spider mode only)

The scope of a crawl determines which URLs are considered part of the site. For example, the scope can be set to

  • Track all URLs on a root domain: Include only the root domain (example.com)

  • Track all URLs on a single subdomain: Include the subdomain + root domain (www.example.com)

  • Track all URLs in a subfolder on any subdomain: Include the root domain + path (example.com/en/uk/)

  • Track only URLs in a single subdomain on a single subfolder: Include the subdomain + root domain + path (www.example.com/products)

  • Track a single URL: Include only the URL you want to track (www.example.com/products/19312981.html)

The scope is determined by the site field in General Campaign Settings the first time a campaign is created. By default, rank tracking, site audits, and backlink reports all share this same scope.

However, there may be times when you may want to use a different scope for site audits than for rank tracking or backlink reports. For example, you may want to include URLs from all subdomains in rank tracking (root domain scope) but only crawl the www subdomain (subdomain scope) for site audits.

To do this, you can set a separate scope for crawls.

Learn more about crawl scope

Manage Exclusions (Spider mode only)

When crawling in spider mode, all URLs within in the crawl scope will be included in the crawl by default.

If there are areas of the site that you prefer not to include in the crawl (such as subdomains, subfolders, parameters), this can be done by using URL Exclusions.

Learn more about URL Exclusions

Alternate Seed URLs (Spider mode only)

When crawling in Spider mode, our crawler will begin by first crawling the website URL set for this campaign. It will follow any redirects and recursively crawl all links found on each page.

There may be times when you would like to change which URL the crawler begins from. You can do this by entering one or more Alternate Seed URLs. These will be treated as additional starting points for the crawler. All Alternate Seed URLs will be treated just like the website URL for this campaign, each with a crawl depth of 0.

Learn more about alternate seed URLs

Did this answer your question?