GEO & AI Search · Glossary · Updated Apr 2026

GPTBot

Definition

GPTBot is OpenAI's web crawler, identified by the user-agent token `GPTBot` and used to gather public content for training future models. Blocking it in robots.txt prevents training use but does not affect ChatGPT Search, which uses a separate user-agent (`ChatGPT-User`) for live retrieval.

Long definition

GPTBot launched in August 2023 as OpenAI's first publicly documented crawler. It identifies itself with a user-agent string starting with Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.x; +https://openai.com/gptbot and respects standard robots.txt directives.

Blocking it is a one-line robots.txt addition:

User-agent: GPTBot
Disallow: /

The important nuance: GPTBot is for training, not for live answers. When a ChatGPT user asks a question that triggers web browsing, OpenAI dispatches a different crawler — ChatGPT-User — to fetch pages in real time. Blocking GPTBot does not block ChatGPT-User. Blocking ChatGPT-User does not block GPTBot. They are independent opt-outs.

A third user-agent, OAI-SearchBot, was introduced for the ChatGPT Search index in late 2024. Sites that want to be findable in ChatGPT Search but excluded from training should allow OAI-SearchBot and ChatGPT-User while disallowing GPTBot. OpenAI documents the full list at openai.com/gptbot.

GPTBot honors User-agent-specific rules and the Disallow directive. It does not honor Crawl-delay (Google doesn't either). It originates from a published IP range OpenAI updates periodically — useful for log file analysis when you want to verify a hit was actually OpenAI and not a spoofer using its user-agent string.

Decision matrix in practice: if your business model depends on people landing on your site from search and AI surfaces, allow OAI-SearchBot and ChatGPT-User. If you also want your content used to improve future GPT models (potentially raising your visibility there long-term), allow GPTBot. If you want to be findable but not trained on, block GPTBot only.

Common misconceptions

"Blocking GPTBot removes my brand from ChatGPT." No. ChatGPT answers from training data already collected before your block, plus live retrieval via ChatGPT-User. To affect live retrieval, you need to block that agent too.
"GPTBot uses the same user-agent as Common Crawl's CCBot." Separate crawlers, separate companies. CCBot operates Common Crawl, which OpenAI also uses, but GPTBot is OpenAI's own first-party crawler.
"A noindex tag blocks GPTBot." No. noindex is a directive for search-engine indexing, not training crawl. GPTBot follows robots.txt rules, not noindex meta tags.
"Blocking GPTBot retroactively removes my content from existing models." Training is one-shot. Blocking now affects future model versions, not GPT-4 or anything already trained. Removal from a deployed model is not a thing OpenAI offers.

Continue exploring