Duplicate content
Duplicate content is substantially identical content appearing on two or more URLs, either on the same site (internal) or across sites (cross-domain). There is no duplicate-content penalty — Google deduplicates by picking one URL as canonical and demoting the rest. Canonicals steer the choice.
Long definition
"Penalty" is the wrong mental model. What actually happens is consolidation: Google clusters near-identical URLs, picks one as the representative (the "Google-selected canonical"), and shows only that one in results. The other URLs in the cluster aren't punished — they just don't appear.
Sources of duplicate content, by frequency:
- HTTP vs HTTPS / www vs non-www not redirected to a single canonical form.
- Trailing slash inconsistencies (
/pagevs/page/). - URL parameters that don't change content (
?ref=,?utm_*, session IDs). - Printer-friendly versions, AMP versions, mobile-separate URLs (mostly historical).
- Product variants (color, size) sharing identical copy on different URLs.
- Syndication — same article republished on partner sites.
- International sites where the same language serves multiple regions (
/en-us,/en-gb,/en-ca) with near-identical copy.
Google's signals for picking the canonical, in order of weight: the URL you declared in rel="canonical", the URL referenced most often in internal links, the URL in the XML sitemap, the HTTPS URL over HTTP, the shorter URL, the URL with better backlinks. These combine — no single one is decisive.
For intentional cross-site duplication (syndication, partner networks), a cross-domain canonical is the correct signal. Expect slower consolidation than same-origin canonicals.
Common misconceptions
- "Duplicate content is a Google penalty." It isn't. Google's own documentation is explicit about this. The risk is dilution and the wrong URL getting picked as canonical, not a demotion.
- "5% identical content is duplicate." No fixed percentage. Google looks at whether pages are substantially similar in intent and content. Heuristically: if changing the URL but serving the same body happens, you have duplicates.
- "Canonical tag fixes duplicate content." It steers Google's choice; it doesn't merge pages. Two URLs with canonical to the same target still exist and still get crawled. For full consolidation you need redirects.
- "Translations are duplicate content." Different languages aren't duplicate. Same language for different regions can be treated as duplicate if the content is identical — use hreflang to signal regional targeting, and make at least some content locally specific.
Continue exploring