Understand AI traffic to your docs

Learn what AI bots and crawlers are, the three categories of AI traffic that matter for documentation, and why measuring them helps your team.

AI systems read your documentation. Training crawlers collect it, answer engines cite it, and assistants fetch it in real time to answer user questions. Most documentation teams have little visibility into this activity. This guide explains the kinds of AI traffic you receive and why they matter.

Why AI traffic matters for documentation

Documentation is high-value text: it is structured, factual, and maintained. That makes it a prime target for AI systems that summarize products, answer technical questions, and generate code.

This affects your team in three ways:

Reach. When an assistant answers a question using your docs, your content reaches users who never visit your site. Accurate, well-structured docs produce accurate answers.
Cost. Aggressive crawlers consume bandwidth and server resources. High-volume bots can affect performance and hosting costs.
Control. You decide which systems may use your content for training versus real-time answers. You cannot make that decision without first seeing the traffic.

Three categories of AI traffic

Not all AI traffic is the same. Distinguishing these categories tells you what each request is for and how to respond.

Training crawlers

Training crawlers collect content to build the datasets that train large language models. They visit broadly and repeatedly, similar to search engine crawlers, but the content feeds model training rather than a search index.

Examples include OpenAI’s GPTBot, Anthropic’s ClaudeBot, and Common Crawl’s CCBot. Google and Apple use separate robots.txt tokens (Google-Extended and Applebot-Extended) to control training use of content their search crawlers already fetch.

On-demand fetchers

On-demand fetchers retrieve a specific page in real time because a user asked an assistant a question that references it. The request happens at the moment of the conversation, not during bulk crawling.

Examples include OpenAI’s ChatGPT-User, Anthropic’s Claude-User, and Perplexity-User. This traffic is low-volume and tied directly to user intent, so it is often the most valuable to allow.

AI search indexers

AI search indexers build the indexes behind answer engines and AI search features. They crawl to keep an index current, which the engine then queries to generate cited answers.

Examples include OpenAI’s OAI-SearchBot, Anthropic’s Claude-SearchBot, and PerplexityBot.

A fourth signal: AI referral traffic

Beyond bots, you also receive human visitors who arrive from AI products. When an answer engine cites your documentation and a user clicks through, that visit appears in your web analytics with a referrer such as chatgpt.com or perplexity.ai.

Referral traffic is not a crawler — it is a real person who found you through an AI answer. Tracking it shows whether AI citations send qualified readers to your docs.

Monitoring versus controlling

Two distinct activities are often confused:

Monitoring measures who reads your docs and how often. It is read-only and never blocks anyone.
Controlling uses tools such as robots.txt, firewall rules, or edge configuration to allow or deny access. See Control AI crawler access.

robots.txt is a control tool, not a monitoring tool — it states your preferences but does not measure or enforce them. Compliance is voluntary, and well-behaved bots honor it while others may not. Always measure actual traffic from your logs rather than assuming your robots.txt rules are followed.

Next steps

Identify AI bots and crawlers — match user-agent tokens to operators and verify them.
Monitor AI traffic to your docs — measure this traffic with logs and analytics.