Technical SEO · Glossary · Updated Apr 2026

Duplicate content

Definition

Duplicate content is substantially identical content appearing on two or more URLs, either on the same site (internal) or across sites (cross-domain). There is no duplicate-content penalty — Google deduplicates by picking one URL as canonical and demoting the rest. Canonicals steer the choice.

Long definition

"Penalty" is the wrong mental model. What actually happens is consolidation: Google clusters near-identical URLs, picks one as the representative (the "Google-selected canonical"), and shows only that one in results. The other URLs in the cluster aren't punished — they just don't appear.

Sources of duplicate content, by frequency:

  • HTTP vs HTTPS / www vs non-www not redirected to a single canonical form.
  • Trailing slash inconsistencies (/page vs /page/).
  • URL parameters that don't change content (?ref=, ?utm_*, session IDs).
  • Printer-friendly versions, AMP versions, mobile-separate URLs (mostly historical).
  • Product variants (color, size) sharing identical copy on different URLs.
  • Syndication — same article republished on partner sites.
  • International sites where the same language serves multiple regions (/en-us, /en-gb, /en-ca) with near-identical copy.

Google's signals for picking the canonical, in order of weight: the URL you declared in rel="canonical", the URL referenced most often in internal links, the URL in the XML sitemap, the HTTPS URL over HTTP, the shorter URL, the URL with better backlinks. These combine — no single one is decisive.

For intentional cross-site duplication (syndication, partner networks), a cross-domain canonical is the correct signal. Expect slower consolidation than same-origin canonicals.

Common misconceptions

  • "Duplicate content is a Google penalty." It isn't. Google's own documentation is explicit about this. The risk is dilution and the wrong URL getting picked as canonical, not a demotion.
  • "5% identical content is duplicate." No fixed percentage. Google looks at whether pages are substantially similar in intent and content. Heuristically: if changing the URL but serving the same body happens, you have duplicates.
  • "Canonical tag fixes duplicate content." It steers Google's choice; it doesn't merge pages. Two URLs with canonical to the same target still exist and still get crawled. For full consolidation you need redirects.
  • "Translations are duplicate content." Different languages aren't duplicate. Same language for different regions can be treated as duplicate if the content is identical — use hreflang to signal regional targeting, and make at least some content locally specific.