Control AI crawler access
Use robots.txt to decide which AI crawlers may access your documentation — block training crawlers while allowing search and on-demand fetchers — with a copy-paste template.
You decide which AI systems may use your documentation. The standard lever is robots.txt, where you allow or disallow individual crawlers by their user-agent token. This guide gives you a recommended strategy and a copy-paste template.
Publish a robots.txt that opts your content out of AI model training while keeping it available to the AI search and assistant traffic that sends readers your way.
Prerequisites
Section titled “Prerequisites”- The ability to serve a file at your site root (
https://docs.example.com/robots.txt). - The user-agent tokens you want to target. See Identify AI bots and crawlers.
The strategy: block training, allow search and on-demand
Section titled “The strategy: block training, allow search and on-demand”AI crawlers fall into categories that the bot reference describes in full. For documentation, a common position is:
- Block training crawlers — they consume bandwidth to build model datasets and return little direct value (for example,
GPTBot,ClaudeBot,CCBot,Bytespider,meta-externalagent). - Block training with opt-out tokens —
Google-ExtendedandApplebot-Extendedopt you out of Gemini and Apple Intelligence training without affecting search. - Allow search indexers and on-demand fetchers — these surface your docs in AI answers and fetch pages when a user asks a question, which sends qualified readers to you (for example,
OAI-SearchBot,Claude-SearchBot,PerplexityBot,ChatGPT-User,Claude-User).
Copy-paste template
Section titled “Copy-paste template”Save this as robots.txt at your site root, then adjust the lists to match your policy. Anything not listed falls through to your existing default rules.
# Block AI training crawlersUser-agent: GPTBotUser-agent: ClaudeBotUser-agent: anthropic-aiUser-agent: CCBotUser-agent: BytespiderUser-agent: meta-externalagentUser-agent: Google-ExtendedUser-agent: Applebot-ExtendedUser-agent: cohere-aiDisallow: /
# Allow AI search indexers and on-demand fetchersUser-agent: OAI-SearchBotUser-agent: Claude-SearchBotUser-agent: PerplexityBotUser-agent: ChatGPT-UserUser-agent: Claude-UserAllow: /Where to put the file
Section titled “Where to put the file”robots.txt must live at your domain root, the same level as sitemap.xml. How you publish it depends on your platform:
- Static sites (Astro, Docusaurus, and others) — place
robots.txtin the directory served at the root, such aspublic/. - Mintlify — add the file through your project’s static asset configuration.
- Subdomains are separate — a docs subdomain (
docs.example.com) needs its ownrobots.txt; the root domain’s file does not apply to it.
Verify your setup
Section titled “Verify your setup”-
Confirm the file is served at the root:
Terminal window curl https://docs.example.com/robots.txt -
Confirm your block and allow rules read as you intend, and that the tokens match the current values in the bot reference.
-
After deploying, monitor your logs to confirm the crawlers you blocked stop appearing — and to catch any that ignore the rules.
Related
Section titled “Related”- Identify AI bots and crawlers — the token reference these rules depend on.
- Monitor AI traffic to your docs — confirm your rules are working.
- Serve a Markdown version of every page — make the content you allow as readable as possible for AI.