What is the Rel=Canonical Tag?

The rel=canonical tag is an HTML element found in the head element of the page that helps prevent duplicate content issues.

A typical rel=canonical tag:

<link rel="canonical" href="http://www.example.com/">

Often times, a page's content can be accessed via multiple URLs, especially if using URL parameters, or a group of pages is so similar it's only worth crawling or indexing one of them. The rel=canonical element is a way for the webmaster to identify which of these URLs is the representative from the group as the preferred page (also called the "canonical" page) and should be indexed.

For example, let's assume www.example.com/blue-widgets?sort-asc is a product listing page. www.example.com/blue-widgets?sort=desc is the same page with the sorting reversed. www.example.com/blue-widgets?sort=asc&source=ppc has the exact same content as the first page but has an additional URL parameter to indicate where the traffic was referred from. If all three of these URLs were crawled normally, they would be flagged as being duplicate content.

Instead, if we put a rel=canonical tag on all three of these pages that said www.example.com/blue-widgets?sort-asc is the canonical version of the page, search engines would know to index this URL and ignore the other two, thus eliminating the duplicate content issue.

What Does "Multiple Rel=Canonical Tags" Mean?

These URLs have more than one rel=canonical tag on the page.

Why It's Important

If there is more than one rel=canonical tag on the page, search engines will get confused and not know which tag contains the actual canonical page. If there is more than one on the page, Google will simply ignore all of the rel=canonical tags, while other search engines may behave erratically or unpredictably when trying to identify the canonical page.

Therefore, using multiple rel=canonical tags is just as bad or possibly worse than omitting to include one on the page.

Dangers of Duplicate Content

  • Wasted Crawl Budget- Search engines do not have unlimited resources, so they usually set limits on how much of your site they're willing to crawl. The number of pages they crawl on your site is referred to as the "crawl budget", and like any limited resource, it must be rationed and prioritized. Ideally, we want search engines to crawl and index the most important pages on your site before crawling or indexing less important or duplicate pages. Therefore, if the rel=canonical tag is not used properly, search engines could use its entire crawl budget on a small number of unique pages and the rest on a large number of duplicate versions of each of them. Instead, by using the rel=canonical tag, we can tell search engines only to crawl and index the unique pages on our site.

  • Lower Ranking or De-Indexation - Search engines really hate duplicate content since it wastes their resources and provides little value for searchers. It can also be an indicator of low-value or thin content. Because of this, recently many search engines have begun to use duplicate content as a ranking factor, looking unfavorably on sites that have problems with duplicate content. Not only will the duplicate pages be affected, but your entire site's rankings could go down as a result of duplicate content. Some non-duplicate pages could even be de-indexed. This was the focus of Google's Panda update in early 2011.

How To Fix

Often times this issue is caused accidentally by using a CMS, plugin, template, or another programmatic tool that inserts a rel=canonical tag into the page without the webmaster being aware of it.

Examine the source of each URL and try to identify how each of the multiple rel=canonical tags was added to the page. Once you've identified the source, you'll want to disable it in that specific tool's settings or source.

Did this answer your question?