Technical SEO · Glossary · Updated Apr 2026

Duplicate content

Definition

Duplicate content is substantially identical content appearing on two or more URLs, either on the same site (internal) or across sites (cross-domain). There is no duplicate-content penalty — Google deduplicates by picking one URL as canonical and demoting the rest. Canonicals steer the choice.

Find related

Long definition

"Penalty" is the wrong mental model. What actually happens is consolidation: Google clusters near-identical URLs, picks one as the representative (the "Google-selected canonical"), and shows only that one in results. The other URLs in the cluster aren't punished — they just don't appear.

Sources of duplicate content, by frequency:

  • HTTP vs HTTPS / www vs non-www not redirected to a single canonical form.
  • Trailing slash inconsistencies (/page vs /page/).
  • URL parameters that don't change content (?ref=, ?utm_*, session IDs).
  • Printer-friendly versions, AMP versions, mobile-separate URLs (mostly historical).
  • Product variants (color, size) sharing identical copy on different URLs.
  • Syndication — same article republished on partner sites.
  • International sites where the same language serves multiple regions (/en-us, /en-gb, /en-ca) with near-identical copy.

Google's signals for picking the canonical, in order of weight: the URL you declared in rel="canonical", the URL referenced most often in internal links, the URL in the XML sitemap, the HTTPS URL over HTTP, the shorter URL, the URL with better backlinks. These combine — no single one is decisive.

For intentional cross-site duplication (syndication, partner networks), a cross-domain canonical is the correct signal. Expect slower consolidation than same-origin canonicals.

Common misconceptions

  • "Duplicate content is a Google penalty." It isn't. Google's own documentation is explicit about this. The risk is dilution and the wrong URL getting picked as canonical, not a demotion.
  • "5% identical content is duplicate." No fixed percentage. Google looks at whether pages are substantially similar in intent and content. Heuristically: if changing the URL but serving the same body happens, you have duplicates.
  • "Canonical tag fixes duplicate content." It steers Google's choice; it doesn't merge pages. Two URLs with canonical to the same target still exist and still get crawled. For full consolidation you need redirects.
  • "Translations are duplicate content." Different languages aren't duplicate. Same language for different regions can be treated as duplicate if the content is identical — use hreflang to signal regional targeting, and make at least some content locally specific.