What Does it Mean?
The robots.txt file is a plain text document placed in the root of your site that contains instructions to robots (search engines or any other type of crawlers) on which pages should or should not be accessible to them. By using this file properly, you can prevent search engines or other crawlers from accessing or indexing certain parts of your site.
A typical robots.txt file:
Specific syntax must be used to make these exclusions. You can give individual instructions to each crawler separately or give general instructions that all crawlers should follow. You can exclude individual pages or entire subdirectories or use wildcards to do simple pattern matching for URL parameters or other partial matches.
The file must be named "robots.txt" and placed directly in the root of the site.
Because of this, each subdomain will be treated separately, and needs its own robots.txt file. (e.g. You will need both http://www.example.com/robots.txt and http://sub.example.com/robots.txt for both of these subdomains to be treated correctly).
Why It's Important
If our crawler, Dragonbot is unable to crawl this URL, it will not be a part of the Site Audit, and no optimization recommendations can be made by Dragon Metrics.
But much more importantly, if search engines are unable to crawl this URL, it could have an effect on how search engines see your site and whether images are able to be included on image search, so you’ll want to ensure whether this was intended or not. If this is unintended, you’ll want to take action.
How To Fix
Take a look at each URL in the table below and ensure you don’t want search engines to crawl it. If you don’t want it crawled, there’s nothing more that you need to do. Otherwise, you’ll need to update your robots.txt file.
Some guidelines on writing a good robots.txt file:
Do not disallow any URL that you want to be ranked on search engines
Learn to use wildcards to do pattern matching in URLs. This can be very effective for large sites, or sites with many URL parameters. Oftentimes sites using URL parameters can cause duplicate content issues.
Remember that excluding a page does not guarantee it will not be crawled or indexed. Keep private data offline or behind a secure login.
Keep in mind that excluding a page does not exclude it from being indexed in search engines. If other pages point to this URL, the page could still be indexed without being crawled. If you want to block the URL from being indexed, use another method such as the noindex directive.
Remember each subdomain needs its own robots.txt file
Test your robots.txt file before uploading - a mistake can be costly and undo months of SEO efforts