GEO & AI Search · Glossary · Updated Apr 2026

PerplexityBot

Definition

PerplexityBot is the indexing crawler used by Perplexity AI to build its search index. `Perplexity-User` is a separate user-agent that fetches pages on demand when a user asks a question. Distinguishing the two lets you opt out of indexing while remaining citable in live answers.

Find related

Long definition

Perplexity operates two distinct crawlers, and the distinction matters for any opt-out decision.

PerplexityBot is the indexing crawler. It walks the web continuously, fetches pages, and builds the retrieval index Perplexity queries when generating answers. It identifies itself with a user-agent containing PerplexityBot and respects robots.txt. Block it with:

User-agent: PerplexityBot
Disallow: /

Perplexity-User is the on-demand crawler. When a user asks Perplexity a question that needs a fresh page fetch, the system dispatches Perplexity-User to retrieve the URL in real time. Historically, this user-agent did not consistently honor robots.txt, treating user-initiated fetches like a browser visit. Perplexity has clarified its stance several times under public pressure, and current behavior is that Perplexity-User does respect a robots.txt block when present.

The selective opt-out logic:

  • Want to be cited by Perplexity but not indexed for training the system? Allow Perplexity-User, block PerplexityBot.
  • Want full opt-out? Block both.
  • Want maximum visibility? Allow both — the indexed corpus drives more total citations than live fetches.

Most sites should default to allowing both. PerplexityBot citations show source links prominently in answers and route real traffic. Loss of visibility in Perplexity translates directly into lost referral clicks for content that would have been cited.

Verification, as with any AI bot, is via published IP ranges and reverse DNS. Spoofers regularly impersonate Perplexity user-agents in scraping operations — a robots.txt block does nothing against them. For abuse-grade traffic, layer firewall rules on top of robots.txt.

Common misconceptions

  • "PerplexityBot and Perplexity-User are the same crawler." Different user-agents, different purposes, different opt-out semantics. A robots.txt rule for one does not affect the other.
  • "Blocking PerplexityBot removes me from Perplexity completely." No. Live Perplexity-User fetches can still pull your page when triggered by a query. To be fully invisible, block both.
  • "Perplexity ignores robots.txt." Reports from 2024 raised this concern about Perplexity-User specifically. Current published policy is that both agents respect robots.txt. Verify with logs against published IPs if you suspect violations.
  • "Allowing PerplexityBot trains a model on my content." Perplexity is a retrieval-augmented system, not primarily a model trainer. The crawl feeds the retrieval index, not foundation-model training data — though the line continues to blur as the company builds its own models.