Control AI crawler access

Use robots.txt to decide which AI crawlers may access your documentation — block training crawlers while allowing search and on-demand fetchers — with a copy-paste template.

You decide which AI systems may use your documentation. The standard lever is robots.txt, where you allow or disallow individual crawlers by their user-agent token. This guide gives you a recommended strategy and a copy-paste template.

Goal

Publish a robots.txt that opts your content out of AI model training while keeping it available to the AI search and assistant traffic that sends readers your way.

Prerequisites

The ability to serve a file at your site root (https://docs.example.com/robots.txt).
The user-agent tokens you want to target. See Identify AI bots and crawlers.

The strategy: block training, allow search and on-demand

AI crawlers fall into categories that the bot reference describes in full. For documentation, a common position is:

Block training crawlers — they consume bandwidth to build model datasets and return little direct value (for example, GPTBot, ClaudeBot, CCBot, Bytespider, meta-externalagent).
Block training with opt-out tokens — Google-Extended and Applebot-Extended opt you out of Gemini and Apple Intelligence training without affecting search.
Allow search indexers and on-demand fetchers — these surface your docs in AI answers and fetch pages when a user asks a question, which sends qualified readers to you (for example, OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, Claude-User).

Copy-paste template

Save this as robots.txt at your site root, then adjust the lists to match your policy. Anything not listed falls through to your existing default rules.

# Block AI training crawlers
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: CCBot
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Google-Extended
User-agent: Applebot-Extended
User-agent: cohere-ai
Disallow: /

# Allow AI search indexers and on-demand fetchers
User-agent: OAI-SearchBot
User-agent: Claude-SearchBot
User-agent: PerplexityBot
User-agent: ChatGPT-User
User-agent: Claude-User
Allow: /

Where to put the file

robots.txt must live at your domain root, the same level as sitemap.xml. How you publish it depends on your platform:

Static sites (Astro, Docusaurus, and others) — place robots.txt in the directory served at the root, such as public/.
Mintlify — add the file through your project’s static asset configuration.
Subdomains are separate — a docs subdomain (docs.example.com) needs its own robots.txt; the root domain’s file does not apply to it.

Verify your setup

Confirm the file is served at the root:
Terminal window
```
curl https://docs.example.com/robots.txt
```
Confirm your block and allow rules read as you intend, and that the tokens match the current values in the bot reference.
After deploying, monitor your logs to confirm the crawlers you blocked stop appearing — and to catch any that ignore the rules.

Identify AI bots and crawlers — the token reference these rules depend on.
Monitor AI traffic to your docs — confirm your rules are working.
Serve a Markdown version of every page — make the content you allow as readable as possible for AI.