What triggers this issue?
This issue reports the pages that have duplicate content but do not declare their canonical version (don't have a canonical tag).
Why is it important?
Although Google asserts that they can automatically choose the best version of the content to show in their search results, it won't necessarily be the page version you want to be indexed. That is why similar or duplicate pages of your website must have a "rel=canonical" attribute to instruct search engines to show the most authoritative (canonical) version of the page in search results.
If you do not use this attribute to handle duplicate content, wrong page versions can be indexed.
Choosing a canonical URL also helps to consolidate link signals for similar or duplicate pages into one preferred URL. Also, it is a proper way to manage syndicated content as the canonical URL can point to a different domain.
Finally, canonical URLs can optimize your crawling budget as Googlebot will not be spending crawling time on duplicate pages.
Even if you indicated the preferred (canonical) page version to Google using "rel=canonical," Google may still choose a different page. Google uses quite a few other signals, such as http/https protocols, page quality, sitemaps, etc.
How to fix it?
Review all the pages with duplicate content listed in this report.
The number in "No. of pages having the same content" column is clickable. If you click on it, you will get a list of pages that have the exact or very similar content to the URL in question.
Within this group of pages, you should pick one canonical version that you want to be indexed in search results. Add its URL to the "rel=canonical" labeling on each page with duplicated content, including the canonical page itself.
You can export the results into a .csv file from Site Audit if necessary.
The canonical page can be specified by:
- Using the rel=canonical <link> tag in the code of a page.
<link rel="canonical" href="http://ahrefs.com/blog/canonical-tags/" />
2. Using the rel="canonical" HTTP header in your page response. This method is especially applicable to non-HTML documents such as PDF files that can be accessed via multiple URLs.
HTTP/1.1 200 OK
Link: <http://ahrefs.com/blog/canonical-tags/>; rel="canonical"
Only valid live pages must be specified as canonicals. Canonical URL must be an absolute URL specifying the protocol.
See Google's guidelines on consolidating duplicate URLs.
Alternatively, you can set 301 redirects for the unnecessary duplicate pages or simply take them down.