Faceted navigation
Faceted navigation is the filter and sort UI common on category and search pages — color, size, brand, price range. Each combination generates a URL. On large sites these combinations explode into millions of low-value URLs that drain crawl budget, dilute internal links, and create duplicate-content signals.
Long definition
The math is the issue. A category page with five facets — color (10 options), size (8), brand (50), price range (5), in-stock filter (2) — has 10 × 8 × 50 × 5 × 2 = 40,000 unique combinations. Multiply across 200 category pages and you have 8 million URLs Googlebot might discover, almost all of them low-value duplicates of the parent category.
Three failure modes follow:
- Crawl waste. Googlebot spends capacity fetching combinations no user searches for. The pages you want indexed get crawled less often.
- Duplicate content. "Red Nike running shoes size 9" and "Nike running shoes red size 9" return the same products. Without a canonical signal, ranking dilutes across both.
- Internal link dilution. Every faceted link spreads link equity to combinations Google won't keep, draining authority from genuine landing pages.
The standard fix stack:
- Self-referencing canonical on the unfiltered category page; faceted variants
rel=canonicalto it — only when content really is a subset, not when the filter creates a meaningfully different page. <a href>vs JS for filter links. Render facet links so they don't generate crawlable href attributes when you don't want them indexed. Google has explicitly recommended this.robots.txtdisallow for known low-value parameter patterns (?sort=,?view=).- Identify high-intent facets. "Red dresses" gets searched; "products available on Tuesdays" doesn't. Promote the first to a real category page; gate the second.
Common misconceptions
- "Just rel=canonical everything to the parent." Canonical is a hint, not a directive. If facets create genuinely different content (different products, different price ranges), Google may ignore the canonical and index the variant anyway.
- "Disallow in robots.txt fixes crawl waste." Disallow stops the body fetch but Googlebot still spends capacity on URL discovery and the disallow check itself. For massive faceted noise, fix the link source.
- "All faceted URLs should be noindex." Some faceted combinations match real search demand — promote them to proper landing pages, don't block them.
Continue exploring