GEO & AI Search · Glossary · Updated Apr 2026

GPTBot

Definition

GPTBot is OpenAI's web crawler, identified by the user-agent token `GPTBot` and used to gather public content for training future models. Blocking it in robots.txt prevents training use but does not affect ChatGPT Search, which uses a separate user-agent (`ChatGPT-User`) for live retrieval.

Find related

Long definition

GPTBot launched in August 2023 as OpenAI's first publicly documented crawler. It identifies itself with a user-agent string starting with Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.x; +https://openai.com/gptbot and respects standard robots.txt directives.

Blocking it is a one-line robots.txt addition:

User-agent: GPTBot
Disallow: /

The important nuance: GPTBot is for training, not for live answers. When a ChatGPT user asks a question that triggers web browsing, OpenAI dispatches a different crawler — ChatGPT-User — to fetch pages in real time. Blocking GPTBot does not block ChatGPT-User. Blocking ChatGPT-User does not block GPTBot. They are independent opt-outs.

A third user-agent, OAI-SearchBot, was introduced for the ChatGPT Search index in late 2024. Sites that want to be findable in ChatGPT Search but excluded from training should allow OAI-SearchBot and ChatGPT-User while disallowing GPTBot. OpenAI documents the full list at openai.com/gptbot.

GPTBot honors User-agent-specific rules and the Disallow directive. It does not honor Crawl-delay (Google doesn't either). It originates from a published IP range OpenAI updates periodically — useful for log file analysis when you want to verify a hit was actually OpenAI and not a spoofer using its user-agent string.

Decision matrix in practice: if your business model depends on people landing on your site from search and AI surfaces, allow OAI-SearchBot and ChatGPT-User. If you also want your content used to improve future GPT models (potentially raising your visibility there long-term), allow GPTBot. If you want to be findable but not trained on, block GPTBot only.

Common misconceptions

  • "Blocking GPTBot removes my brand from ChatGPT." No. ChatGPT answers from training data already collected before your block, plus live retrieval via ChatGPT-User. To affect live retrieval, you need to block that agent too.
  • "GPTBot uses the same user-agent as Common Crawl's CCBot." Separate crawlers, separate companies. CCBot operates Common Crawl, which OpenAI also uses, but GPTBot is OpenAI's own first-party crawler.
  • "A noindex tag blocks GPTBot." No. noindex is a directive for search-engine indexing, not training crawl. GPTBot follows robots.txt rules, not noindex meta tags.
  • "Blocking GPTBot retroactively removes my content from existing models." Training is one-shot. Blocking now affects future model versions, not GPT-4 or anything already trained. Removal from a deployed model is not a thing OpenAI offers.