GEO & AI Search · Glossary · Updated Apr 2026

Google-Extended

Definition

Google-Extended is an opt-out token for `robots.txt` that blocks your content from training Google's generative models (Gemini, Vertex AI generative APIs) without affecting Googlebot, Google Search ranking, or indexing. Introduced September 2023 as a separate dial from search inclusion.

Find related

Long definition

Before September 2023, opting out of Google's generative-AI training meant blocking Googlebot — which also removed you from Google Search. Google-Extended fixed that by separating the two decisions.

The robots.txt rule is one block:

User-agent: Google-Extended
Disallow: /

Adding this excludes your content from training Gemini, Bard's successor models, and the Vertex AI generative-language APIs. Googlebot continues to crawl and index for Search. Your rankings are unaffected.

Google-Extended is not a separate crawler. There is no actual user-agent string named Google-Extended hitting your server — Googlebot still does the fetching. The token is a directive Google's training-data pipelines read after the fetch, deciding whether the page is eligible for inclusion in training corpora. This design means a single Googlebot crawl serves both Search and AI training simultaneously, gated by the directive.

The token does not control AI Overviews or grounded answers in Search Generative Experience / AI Mode. Those features use the live Search index, not training data. If you want to disappear from AI Overviews specifically, you need nosnippet or noindex — which also affects regular Search snippets and ranking. There is currently no clean opt-out for AI Overviews alone, and Google has confirmed this design choice publicly.

For Google Workspace and Google Cloud customers, Google-Extended also covers customer-grade Vertex AI training where applicable. Documentation lives at developers.google.com — Google updates the reference page when scope expands.

Practical recommendation for most publishers: leave Google-Extended allowed unless your content is the entire product (paywalled news, premium reference databases). The opt-out costs visibility in future Gemini-grounded answers without giving you any direct compensation.

Common misconceptions

  • "Google-Extended blocks AI Overviews." It doesn't. AI Overviews pull from the Search index, which Googlebot — not Google-Extended — populates. Blocking Google-Extended only opts you out of model training.
  • "Google-Extended is a separate crawler hitting my server." No separate fetch, no separate user-agent in your access logs. The directive is read by Google's training pipeline, not enforced at fetch time.
  • "Blocking Googlebot also blocks Google-Extended automatically." Yes — but it's a sledgehammer. You also lose all Search visibility. Use User-agent: Google-Extended for the surgical version.
  • "Google-Extended retroactively removes my content from Gemini." No. Models already trained on your content stay trained. The block applies to future training cycles only.