Duplicate content occurs when one or more URLs share the same or very similar content.
Search engines hate duplicate content, since returning several pages with the same, or nearly the same content on a SERP presents a very poor user experience.
At one point, this issue was merely an annoyance for search engines, and they just did their best at filtering out duplicate content. But in recent years, search engines have made the elimination of duplicate content from their indexes a major priority, and likewise have begun de-valuing sites with duplicate content. Because of this, avoiding duplicate content has become one of the most important issues for the SEO of your site.
Dangers of Duplicate Content
- Wasted Crawl Budget - Search engines do not have unlimited resources, so they usually set limits on how much of your site they're willing to crawl. The number of pages they crawl on your site is referred to as the "crawl budget", and like any limited resource, it must be rationed and prioritized. Ideally we want search engines to crawl and index the most important pages on your site before crawling or indexing less-important or duplicate pages. Therefore, if there is much duplicate content on your site, search engines could use its entire crawl budget on a small number of unique pages and the rest on a large number of duplicate versions of each of them.
- Lower Ranking or De-Indexation - Search engines really hate duplicate content since it wastes their resources and provides little value for searchers. It can also be an indicator of low-value or thin content. Because of this, recently many search engines have begun to use duplicate content as a ranking factor, looking unfavorably on sites that have problems with duplicate content. Not only will the duplicate pages be affected, but your entire site's rankings could go down as a result of duplicate content. Some non-duplicate pages could even be de-indexed. This was the focus of Google's Panda update in early 2011.