Indexability
Indexability is whether a crawlable URL is eligible to appear in a search engine's index. Blocked by noindex directives, canonical tags pointing elsewhere, thin or duplicate content, or quality filters. Crawlability is a prerequisite but not a guarantee.
Long definition
Indexability sits one gate downstream of crawlability. Once Googlebot has fetched the page, the URL still has to clear several filters before it becomes eligible to appear in search results:
- Directive filters — a
<meta name="robots" content="noindex">tag, anX-Robots-Tag: noindexheader, or a canonical tag pointing at a different URL all remove the page from eligibility. - Content filters — thin content (too little substance for the claimed topic), exact duplicates of other URLs on the same site, and "soft 404" patterns (a 200 OK page saying "this product is gone") get filtered silently.
- Quality filters — broader signals around site trust, spam patterns, and topical authority can demote or remove URLs even when directives and content look clean.
Search Console's "Page indexing" report is the clearest diagnostic. The important distinction is between "indexed" URLs (eligible to rank), "crawled - currently not indexed" (fetched, failed an index-stage filter), and "excluded" with a specific reason (noindex, canonical mismatch, blocked by robots.txt, etc.).
Common misconceptions
- "Indexable means it will rank." No. Indexability is the floor; ranking is a separate question entirely driven by relevance and authority signals on the query.
- "Noindex + robots.txt disallow is belt-and-braces." It's actually incompatible. A URL blocked by robots.txt can't be crawled, so Google can't see the
noindex. The URL may stay in the index with a generic snippet until it drops out naturally. - "Self-canonicals guarantee indexability." They help Google understand your preference, but a self-canonical on a thin/duplicate page doesn't unblock it. Google can and does ignore canonicals when signals conflict.
- "Removing a URL from the sitemap removes it from the index." It doesn't. Use
noindex(then wait for a recrawl), or a410 Goneresponse code to drop a URL.
Continue exploring