What is the Rel=Canonical Tag?
A typical rel=canonical tag:
<link rel="canonical" href="http://www.example.com/">
Often times, a page's content can be accessed via multiple URLs, especially if using URL parameters, or a group of pages are so similar it's only worth crawling or indexing one of them. The rel=canonical element is a way for the webmaster to identify which of these URLs is the representative from the group as the preferred page (also called the "canonical" page) and should be indexed.
For example, let's assume www.example.com/blue-widgets?sort-asc is a product listing page. www.example.com/blue-widgets?sort=desc is the same page with the sorting reversed. www.example.com/blue-widgets?sort=asc&source=ppc has the exact same content as the first page, but has an additional URL parameter to indicate where the traffic was referred from. If all three of these URLs were crawled normally, they would be flagged as being duplicate content.
Instead, if we put a rel=canonical tag on all three of these pages that said www.example.com/blue-widgets?sort-asc is the canonical version of the page, search engines would know to index this URL and ignore the other two, thus eliminating the duplicate content issue.
What Does "Rel=Canonical Empty / Missing" Mean?
If a URL is listed in the table below, this means that there was no rel=canonical content found on the page. This could mean one of two things - either the tag is missing completely, or the "href" attribute was left empty with no content (e.g. <link rel="canonical" href="" />).
Why It's Important
The omission of the rel="canonical" tag on your site does not mean you have an issue with duplicate content. However, if you have multiple pages with duplicate or very similar content, omitting the rel=canoncial tag will likely result in duplicate content issues, which is a major problem for the SEO health of your site.
Dangers of Duplicate Content
- Wasted Crawl Budget- Search engines do not have unlimited resources, so they usually set limits on how much of your site they're willing to crawl. The number of pages they crawl on your site is referred to as the "crawl budget", and like any limited resource, it must be rationed and prioritized. Ideally we want search engines to crawl and index the most important pages on your site before crawling or indexing less-important or duplicate pages. Therefore, if the rel=canonical tag is not used properly, search engines could use its entire crawl budget on a small number of unique pages and the rest on a large number of duplicate versions of each of them. Instead, by using the rel=canonical tag, we can tell search engines only to crawl and index the unique pages on our site.
- Lower Ranking or De-Indexation - Search engines really hate duplicate content since it wastes their resources and provides little value for searchers. It can also be an indicator of low-value or thin content. Because of this, recently many search engines have begun to use duplicate content as a ranking factor, looking unfavorably on sites that have problems with duplicate content. Not only will the duplicate pages be affected, but your entire site's rankings could go down as a result of duplicate content. Some non-duplicate pages could even be de-indexed. This was the focus of Google's Panda update in early 2011.
How To Fix
Look through the URLs in the table. If you notice the content on the page is very similar or a duplicate of another page, you will need to take action. Decide which of the duplicate or near-duplicate pages in the group should be the preferred or "canonical" version. Then add the rel=canonical tag to all of the pages in the group, with the target pointed to this preferred URL.