Monitor AI traffic to your docs
Measure AI crawler and referral traffic to your documentation using server logs, CDN and edge analytics, and web analytics tools such as GA4.
This guide shows you how to measure the AI traffic reaching your documentation. You identify AI crawlers in your access data and track human visitors who arrive from AI answer engines. The techniques are tool-agnostic and work with most hosting and analytics stacks.
Produce a repeatable view of:
- Which AI bots fetch your documentation, and how often.
- Which pages they fetch most.
- How many human visitors arrive from AI products such as ChatGPT and Perplexity.
Prerequisites
Section titled “Prerequisites”- Access to one source of traffic data: raw server or access logs, a CDN or edge dashboard, or a web analytics tool.
- The user-agent tokens you want to track. See Identify AI bots and crawlers.
- For verification, the ability to check source IP addresses against published ranges or reverse DNS.
Choose where to measure
Section titled “Choose where to measure”Each data source captures a different slice of traffic. Pick the one that matches what you need.
| Source | Captures | Best for |
|---|---|---|
| Server and access logs | Every request, including bots that run no JavaScript | Complete crawler visibility |
| CDN and edge analytics | Requests at the network edge, often with built-in bot labels | Sites behind a CDN that want low-effort dashboards |
| Web analytics (such as GA4) | Human visits that execute JavaScript | AI referral traffic from real users |
Measure AI crawlers in server logs
Section titled “Measure AI crawlers in server logs”Server logs record the user agent and source IP of every request, which makes them the most complete source for crawler activity.
-
Confirm your access logs include the user-agent field. Common web servers log it by default in the combined log format.
-
Filter the logs for the tokens you want to track. For example, to count requests from major AI crawlers in an access log:
Terminal window grep -E "GPTBot|ClaudeBot|OAI-SearchBot|PerplexityBot|CCBot|Bytespider" access.log | wc -l -
Break the activity down by bot and by page to see what each crawler reads:
Terminal window # Requests per AI bot tokengrep -oE "GPTBot|ClaudeBot|OAI-SearchBot|PerplexityBot|CCBot|Bytespider" access.log \| sort | uniq -c | sort -rn -
Verify high-volume bots are genuine before you act on the numbers. Match the source IP against the operator’s published ranges or use forward-confirmed reverse DNS, as described in Verify a bot is genuine.
Find AI traffic on your hosting platform
Section titled “Find AI traffic on your hosting platform”Your hosting platform sees every request at the edge, including bots that run no JavaScript, so it can surface AI-bot traffic even for a fully static site. Use the view that matches your platform.
Vercel
Section titled “Vercel”Open Observability → Edge Requests, which breaks traffic down by individual bot and bot category — including AI crawlers and search engines — on all plans. Firewall Observability shows requests by user agent and IP, and the AI bots managed ruleset lets you log or deny known AI crawlers from a list Vercel maintains. For raw access logs, stream Log Drains to your own store (paid). See Bot Management.
Netlify
Section titled “Netlify”Set up Log Drains to stream server-level request logs to a destination such as Datadog or S3. Netlify tags known AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) under an ai user-agent category and exposes it on the Netlify-Agent-Category request header, so you can filter AI traffic without maintaining your own token list. Because this is server-side, it captures bots that client-side analytics drop.
Cloudflare
Section titled “Cloudflare”If your site uses Cloudflare as a proxy, AI Crawl Control (formerly AI Audit) shows which AI services access your content, broken down by provider, bot type (AI data scraper, AI search crawler, archiver), and the sections they fetch — and flags which crawlers honor your robots.txt. It is the most detailed AI-traffic view of the three, and you can set allow or block rules from the same screen.
Another host or CDN
Section titled “Another host or CDN”If your platform sees requests at the edge, check its dashboard for a bot or crawler analytics view, and prefer reports that verify bots over ones that trust the user-agent string. Otherwise, fall back to server logs as described in the preceding section.
Track AI referral traffic in web analytics
Section titled “Track AI referral traffic in web analytics”Referral traffic is the human side of AI traffic: visitors who click through from an AI answer. Capture it in your web analytics tool by segmenting on the referrer.
-
In your analytics tool, create a segment or filter for traffic whose source or referrer hostname matches an AI product, such as
chatgpt.com,perplexity.ai,gemini.google.com, orcopilot.microsoft.com. -
Compare landing pages, engagement, and conversions for this segment against your other channels to judge whether AI citations send qualified readers.
-
Review the segment over time. The set of AI products that drive referrals changes, so revisit the hostname list each quarter.
A note on privacy and abuse
Section titled “A note on privacy and abuse”- Do not log personal data you do not need. Access logs can contain IP addresses and other identifiers. Follow your retention and privacy policies, and anonymize where your obligations require it.
- Rate-limit abusive crawlers, not all bots. If a verified bot fetches aggressively enough to affect performance, rate-limit or throttle it at the edge. Reserve blocking for bots you have confirmed are both unwanted and genuine.
Verify your setup
Section titled “Verify your setup”You have a working monitor when you can answer, from your own data:
- Which AI bots reached your docs in the last week, and how many requests each made.
- Which pages those bots fetched most.
- How many human visitors arrived from AI products, and where they landed.
Related
Section titled “Related”- Understand AI traffic to your docs — the categories behind the numbers.
- Identify AI bots and crawlers — the tokens and verification methods this guide relies on.
- Add an llms.txt file to your docs — help AI systems use your content well once you can see them.