Our crawler, Dragonbot, crawls your site just like a search engine. In most cases, once you turn on crawling for a campaign Dragonbot will be able to crawl your site without issue. However, there may be instances where a site has settings or policies in place that prevent Dragonbot from crawling the site. These will have to be resolved before Dragonbot is able to crawl your site.
Below is a list of the most common issues and how to solve them.
Blocked by robots.txt
Dragonbot respects the robots.txt directive. If your site is not being crawled due to being blocked by the robots.txt file, you'll want to update the directives for the user agent "dragonbot".
To allow Dragonbot to crawl any page on your site, add the following code to the robots.txt file in the root domain (or any other subdomain Dragon Metrics is set to crawl):
Some servers have strict policies against bots crawling the site. These sites may try to identify web requests as bots and fail to deliver an appropriate response. (e.g. return an an HTTP 4xx or 5xx or a soft 404). Servers may use a number of techniques to identify bots, such as:
- User agent
- IP address
- Cookie handling
Whichever method is used, you'll need to find a way to add Dragonbot to the site's "allow list".
A common approach is to add an IP address or range to the allow list. However, because the IP address of Dragonbot is dynamic and may change often, this will not be effective.
Instead, the most reliable way to identify Dragonbot is by its user agent string:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36;Dragonbot; http://www.dragonmetrics.com
Please note that several variations of this user agent string are in use, and may change from time to time. To identify the user agent as Dragonbot, please do not match the entire string. Instead, use the final part of the user agent string to identify it, which will remain constant:
Redirecting based on IP address
Sometimes a server may use the IP address of a request to try to identify the location of the visitor (or bot) and redirect the request to a different URL or site.
This is problematic for many reasons, outside of Dragonbot crawling your site. Mapping a location to an IP address is error-prone, and redirecting in this way causes issues with search engine crawlers (even Google has recommended against this technique), and may inconvenience or frustrate users.
Unfortunately, with a location-based redirect in place, Dragonbot may not be able to crawl your site effectively. The only solution is to disable this redirect, which will fix the above issues as well.
Recommended techniques to use in place of location redirects include:
- Google Search Console location targeting
- Detect location by IP address as normal, but offer a header or interstitial suggesting a link to the most appropriate URL instead of forcing a redirect
If it is not feasible to disable the location redirect for all visitors, we recommend adding Dragonbot to an "allow list" which disables the redirect for these visitors.
While Dragonbot respects webmasters' limited resources by adhering to politeness rules (such as crawling at a reasonable speed), it's still possible the server is throttling or blocking Dragonbot for making too many requests in a period of time.
You can adjust the crawl speed of Dragonbot to choose a slower option.
There may be other issues that could be preventing Dragonbot from crawling properly. Read more about other common crawling issues.