Crawlability
Crawlability is whether a URL can be discovered and fetched by a search engine crawler. It depends on internal linking exposure, robots.txt rules, DNS and server availability, and HTTP response codes. A URL can be crawlable without being indexable — and the reverse is also true.
Long definition
Crawlability is strictly the discovery + fetch step. For a URL to be crawlable, a crawler must:
- Know the URL exists — via an internal link, an external backlink, a sitemap entry, or a manual submission.
- Be allowed to fetch it — no robots.txt
Disallowcovering its path for the relevant user-agent. - Actually reach it — DNS resolves, TCP connects, TLS handshakes, server returns a non-5xx response within a reasonable timeout (Googlebot tends to give up past ~15s).
If any of those three fails, the URL is not crawlable. Crawlability is a prerequisite for indexability but not the same thing: a URL can be perfectly crawlable and still blocked from the index by noindex, a canonical pointing elsewhere, or a quality filter.
The clearest signal of crawlability problems is discovery gaps: URLs in your sitemap that never appear in Google Search Console's "Crawled - currently not indexed" or "Discovered - currently not indexed" reports often have orthogonal issues (internal linking, server errors), not index directives.
Common misconceptions
- "Submitting a URL in Search Console guarantees it will be crawled." It queues the URL for crawling with a modest priority bump. Google still applies demand signals. Repeated submissions don't stack priority.
- "If Google can crawl it, it will be indexed." No — indexability is the next gate. Thin content, duplicate content, and quality filters all block at the index stage even when the fetch succeeded.
- "An orphan page is unreachable." Orphan pages with backlinks or sitemap entries are crawlable. The problem is they usually have zero internal PageRank flow, so Google crawls them rarely and indexes them reluctantly.
- "Robots.txt fixes crawlability." Robots.txt can only remove crawlability; it cannot add it. If a page isn't reachable by link or sitemap, it won't get crawled regardless of what robots.txt allows.
Continue exploring