Crawl trap
A crawl trap is any URL pattern that generates an unbounded or pathologically large set of URLs. Common sources: infinite calendar archives, faceted navigation combinatorics, session IDs in query strings, and recursive relative-link bugs. Crawl traps drain crawl budget and frustrate indexing of pages that actually matter.
Long definition
Three traps account for most real-world cases.
Calendar archives. A "next month" link on a blog calendar widget with no end date — Googlebot can crawl 2026, 2027, 2028 indefinitely, finding only empty pages. Common on legacy WordPress installs.
Faceted nav without bounds. Combinatorial filters with no canonical strategy. A 5-facet category page can yield millions of URLs.
Session IDs in URLs. /?sid=abc123 rotated per visit makes every page appear unique. Googlebot crawls hundreds of versions of the same content. Cookies are the modern fix; PHP PHPSESSID= in URLs is the historical disaster.
Recursive relative links. A bug where /foo/bar/ contains a relative link to bar/, generating /foo/bar/bar/, /foo/bar/bar/bar/, and so on. Common in poorly-templated breadcrumb code.
How you find a crawl trap:
- Log file analysis — look for Googlebot fetching URLs that follow predictable infinite patterns. Sort by URL frequency; the trap appears in the top 100.
- GSC Coverage — spike in "Crawled, currently not indexed" or "Discovered, currently not indexed" with similar URL stems.
- Crawl with your own crawler without max-depth and watch the URL count grow without bound.
How you fix:
- At the source. Remove the unbounded link generation. A calendar widget should stop at the first published post. A breadcrumb should use absolute paths.
nofollowon internal links into the trap as a stopgap. Google now treats it as a hint, but it still helps.robots.txtdisallow is containment, not a fix. Googlebot still spends capacity on the disallow check.
Common misconceptions
- "It only matters for huge sites." A 100-page blog with a buggy calendar can spawn 10,000+ trap URLs. Crawl traps scale with the bug, not with site size.
- "Disallow makes it go away." It stops indexing, not discovery. Trap URLs still appear in Discovered – currently not indexed and waste GSC report space.
- "My CMS would catch this." Most popular CMSes ship with crawl traps out of the box (calendar plugins, faceted nav modules, parameter explosions). Audit, don't trust.
Continue exploring