Dragon Metrics will automatically crawl the website of each campaign every 14 days without requiring any additional configuration. However, there are number of settings that can be configured to ensure your site is crawled optimally.
Crawl settings may be accessed in 3 areas:
Initial campaign setup
When creating a new campaign, click Show Advanced Options under Crawl Settings to access crawler options.
Crawler Campaign Settings
In Crawler Settings under Campaign Settings in the left navigation
In Site Auditor, click Crawl options in the upper-right of the page
The following settings can be configured for a site crawl:
Enable or disable crawls
Crawls can be enabled or disabled at any time. If crawling has been disabled for a campaign, past crawl data will still be available, but no new automatic crawls will be scheduled.
There are two modes available in Crawler Settings: Spider and List.
Spider mode is the default method, which will start crawling the home page of the site and recursively crawl all links found on this each page. Learn more about spider mode
List mode allows you upload a list of URLs to be crawled. Only pages included in this list will be crawled. Learn more about list mode
Crawl Limits (Spider mode)
Crawl limits determine the maximum number of URLs crawled for a campaign.
The maximum value allowed for crawl limits is defined by the lower of either:
The number of remaining crawl credits in your account
The maximum URLs crawled per campaign.
Both of these numbers are determined by your subscription plan.
By default, our crawler (Dragonbot) makes requests at a moderate and respectful rate.
If you have a large site and a powerful server, you may want to crawl at a faster rate to finish the crawl in shorter amount of time. If you have a more sensitive server that may return HTTP 5xx errors for crawlers that make requests faster than a certain rate, you may want to adjust the crawl rate slower to avoid these blocked requests.
If enabled, we’ll crawl images on this site to check for issues such as broken images or missing alt tags. Each image crawled will use 0.25 crawl credits.
Crawl Scope (Spider mode only)
The scope of a crawl determines which URLs are considered part of the site. For example, the scope can be set to
Track all URLs on a root domain: Include only the root domain (example.com)
Track all URLs on a single subdomain: Include the subdomain + root domain (www.example.com)
Track all URLs in a subfolder on any subdomain: Include the root domain + path (example.com/en/uk/)
Track only URLs in a single subdomain on a single subfolder: Include the subdomain + root domain + path (www.example.com/products)
Track a single URL: Include only the URL you want to track (www.example.com/products/19312981.html)
The scope is determined by the site field in General Campaign Settings the first time a campaign is created. By default, rank tracking, site audits, and backlink reports all share this same scope.
However, there may be times when you may want to use a different scope for site audits than for rank tracking or backlink reports. For example, you may want to include URLs from all subdomains in rank tracking (root domain scope) but only crawl the www subdomain (subdomain scope) for site audits.
To do this, you can set a separate scope for crawls.
Manage Exclusions (Spider mode only)
If there are areas of the site that you prefer not to include in the crawl (such as subdomains, subfolders, parameters), this can be done by using URL Exclusions.
Alternate Seed URLs (Spider mode only)
When crawling in Spider mode, our crawler will begin by first crawling the website URL set for this campaign. It will follow any redirects and recursively crawl all links found on each page.
There may be times when you would like to change which URL the crawler begins from. You can do this by entering one or more Alternate Seed URLs. These will be treated as additional starting points for the crawler. All Alternate Seed URLs will be treated just like the website URL for this campaign, each with a crawl depth of 0.