Duplicate pages are pages with the same or similar Title tags, Description, H1 tag, or Content.
You may ask yourself, "aren't all duplicates bad?"
This is not true! Only when two duplicate pages do not have their Hreflang, or Canonical tags correctly set to show the relationship between them, then they are considered as bad duplicates.
Hreflang - Means two or more pages are different language or regional versions of the same page.
Canonical - Means both pages are same/similar, but one is flagged as the "official" version.
You can run a site audit on a website to check for good and bad duplicates. Once the site audit is complete, check the Duplicates report to find a graph displaying the different kinds duplicates found in a website:
The upper chart groups duplicate internal pages based on content element evaluation (title, meta description, H1, and content).
Unique: Pages with unique content.
Good duplicates: Pages with duplicate content which define one unique main version with their canonical, hreflang or pagination setups
Bad duplicates: Pages with duplicate content that is not properly handled with their canonical, hreflang or pagination setups. This might lead to indexation issues.
Not set or empty: Pages without content. This is likely to affect your SEO performance negatively.
The lower chart groups internal HTML URLs with status code 200 into clusters based on the similarity of their content. The number of pages in each cluster is depicted within each square and corresponds to its size, while the color of the square indicates whether the duplication is handled properly via canonical tag setups.
Canonical tag matching: All of the pages within the cluster have the same canonical tag, which means that the cluster has one canonical URL.
Canonical tag not matching: Some or all of the pages within the cluster link to different URLs via canonical tags. This might lead to indexation issues.
Canonical tag not set: Some or all of the pages within the cluster don’t have a canonical tag set. This might lead to indexation issues.
Duplicates with Similar content
Sometimes you may come across a duplicate flagged in Site Audit as having "Similar" content.
Site audit uses a smart text extraction method to extract content from body text such as headers, paragraph elements, links, etc, and then compares the content of all pages together. If some content is very close but not matching exactly, this is where content will be flagged as "Similar" duplicates.
Occasionally however, two pages that don't have similar content could still be flagged as similar. In such cases, please reach out to support at [email protected] or on our live chat https://help.ahrefs.com so we can double check if duplicates have been flagged correctly.