Technical SEO · Glossary · Updated Apr 2026

Canonicalization

Definition

Canonicalization is the process by which Google selects one URL from a set of duplicate or near-duplicate URLs to represent the cluster in search results. The chosen URL is the canonical; the rest are alternates. Google weighs multiple signals — rel=canonical, redirects, sitemaps, internal links — to decide.

Find related

Long definition

Most non-trivial sites generate duplicate URLs without trying. Sort orders, tracking parameters, session IDs, locale variants, HTTP vs HTTPS, www vs non-www, trailing-slash differences — all produce URLs that point at substantively the same content. Google clusters these and picks one URL as canonical to dedupe the index. The other URLs aren't penalized; they're consolidated into the canonical's signals.

The signals Google uses, roughly in order of strength (per Google's canonicalization documentation):

  1. HTTP redirects — a 301 or 308 from URL A to URL B is a strong "B is canonical" signal.
  2. rel=canonical link element<link rel="canonical" href="..."> in the head of every URL in the cluster.
  3. Sitemap inclusion — URLs in your XML sitemap are preferred over those not in it.
  4. Internal link signals — which URL your own internal navigation points to most.
  5. HTTPS preference — given equivalent signals, Google prefers HTTPS.
  6. URL shortness and cleanliness — given equivalent signals, shorter and cleaner wins.
  7. External link signals — which URL in the cluster has the most inbound links.

Crucially, Google may override your declared canonical. The rel=canonical is a hint, not a directive. If your declared canonical contradicts other signals (e.g. you canonicalize to a URL that 301-redirects elsewhere, or to a URL with no inbound links while a sibling has thousands), Google picks what looks like the right URL anyway. Search Console's URL Inspection tool shows both "User-declared canonical" and "Google-selected canonical" — when they differ, you have a signal-consistency problem to fix.

When canonicalization gets messy:

  • Faceted navigation?color=red&size=L produces an explosion of near-duplicates. Canonical to the parent category usually.
  • Pagination — page 2, page 3, etc. should self-canonicalize, not canonicalize to page 1 (Google deprecated rel=prev/next in 2019).
  • Mobile separate URLsm.example.com/page should canonical to example.com/page and the desktop URL should alternate link to mobile.
  • Internationalization — locale variants should hreflang each other, not canonical to a single master.
  • Tracking parameters — UTM codes, click IDs, session tokens. Strip server-side or canonical to the clean URL.

Get canonicalization right and Google consolidates link equity, ranking signals, and crawl budget on the URL you want. Get it wrong and the algorithm may pick a URL that doesn't even exist anymore, or split equity across the cluster.

Common misconceptions

  • "rel=canonical is a directive." It's a hint. Google considers it alongside other signals and may pick a different canonical if your declared one looks wrong. Make all signals point the same direction.
  • "Canonicalizing duplicate pages is enough — I don't need to fix the duplication." Canonicalization deduplicates the index, but Googlebot still crawls all the duplicate URLs. For large sites, that's wasted crawl budget. Fix the duplication source where you can (parameter handling, faceted nav cleanup) and use canonical as the safety net.
  • "Self-referential canonicals are useless." They aren't. A canonical pointing to its own URL prevents Google from picking a worse alternate (e.g. a URL with appended tracking parameters as the canonical). Many CMSes self-canonicalize by default for exactly this reason.
  • "Cross-domain canonicals don't work." They do — Google supports them and uses them for syndication scenarios. But they're a strong "this URL doesn't deserve to rank, the other domain does" signal, so use only when that's truly your intent.