Bot detection
How FormShield names, IP-verifies, and lets you allow or block crawlers and AI agents
FormShield identifies the bots that hit your site by their user agent, and for the operators that publish their IP ranges, it goes further: it verifies that the request actually came from that operator. A user agent is trivial to forge, so a forged “Googlebot” from an unrelated IP is flagged as spoofed and scored high — the opposite of a real one.
This page covers what FormShield identifies, the verified-versus-spoofed distinction, the fields and reasons it adds to each observation, and how to allow or block bots per project.
What FormShield identifies
Every pageview and server-reported request runs through a registry of known bots across three groups:
GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot, Amazonbot, and more.
Googlebot, Bingbot, DuckDuckBot, Applebot, YandexBot, Baiduspider, and others.
AhrefsBot, SemrushBot, MJ12bot, DotBot, DataForSeoBot, Screaming Frog.
A matched bot adds its identity to the observation: a stable bot_id, the
operator (Google, OpenAI, Anthropic, …), the category, and the group.
Verified vs spoofed
Some operators publish the IP ranges their crawlers run from. For those, FormShield checks that the request IP is actually inside the operator’s ranges and records a three-state result:
verified | Meaning |
|---|---|
true | The user agent matches and the IP is in the operator’s published ranges. A genuine crawler. |
false | Spoofed. The user agent claims a crawler whose operator publishes ranges, but the IP is not in them. The classic impersonation pattern — scored high. |
null | Unverifiable: the bot’s operator publishes no ranges (the bot is named on user agent alone), or no IP was available. |
Which bots are IP-verified
FormShield verifies against the operators that publish authoritative IP ranges:
| Operator | Verified bots |
|---|---|
| Googlebot and its family (Googlebot-Image, Storebot-Google, Google-InspectionTool) | |
| Microsoft | Bingbot, BingPreview |
| OpenAI | GPTBot, ChatGPT-User, OAI-SearchBot |
| DuckDuckGo | DuckDuckBot |
Every other crawler — ClaudeBot, PerplexityBot, Bytespider, Google-Extended,
Applebot, CCBot, Amazonbot, the SEO tools, and the rest — is named on its user
agent alone. These carry verified: null: FormShield can tell you the request
claims to be that bot, but it can’t prove the IP. As more operators publish IP
ranges, more bots move into the verified set.
What it adds to an observation
Bot detection adds fields to the observation’s metadata and reasons to its
reasons array.
| Field | Meaning |
|---|---|
bot_id | The matched bot’s stable id, e.g. googlebot, gptbot. null when no bot matched. |
bot_operator | The operating company, e.g. Google, OpenAI. |
bot_category | ai_crawler, search_crawler, or seo_tool. |
bot_verified | true (IP-verified), false (spoofed), or null (unverifiable / no IP). |
| Reason | Meaning |
|---|---|
bot:id:<id> | A bot was identified, e.g. bot:id:googlebot. |
bot:verified | The bot is IP-verified — user agent and IP agree. |
bot:spoofed:<id> | The user agent claims this bot, but the IP is out of the operator’s ranges. |
bot:unverified | The bot is named on user agent alone (no published ranges, or no IP). |
These sit alongside the user-agent reasons (bot:ai_crawler, bot:search_crawler,
and the agent name) described in pageview tracking.
How bots affect the score
A confirmed search crawler and a spoofed one land at opposite ends:
- Verified search crawlers (Googlebot, Bingbot, DuckDuckBot) default to allow. Their bot user agent and datacenter IP are expected, not risk, so FormShield credits them back rather than penalizing them.
- Verified AI crawlers (GPTBot, OAI-SearchBot, …) stay visible at review. They are benign but whether you serve them is a business decision — so FormShield surfaces them for you to allow or block per project rather than waving them through.
- Spoofed crawlers are scored high — a spoofed user agent on its own is enough to reach block.
Allow or block bots per project
Open the project’s Settings → Bots in the dashboard. Each group (AI Crawlers, Search Engines, SEO Tools) and each individual bot has a three-state control:
| Rule | Effect |
|---|---|
| Default | No override — the score decides. |
| Allow | The bot’s traffic is allowed past the score. |
| Block | The bot’s traffic is blocked. |
A per-bot rule overrides its group’s rule, so you can block a whole group and allow one bot within it, or vice versa.
A shield marks bots that are IP-verifiable. An Allow rule only ever takes effect for those, and only when the request is genuinely verified:
So a practical setup for crawler-heavy traffic: leave verified search crawlers on Default (they already allow), and decide per AI crawler whether to Allow (you welcome it) or Block (you don’t want it training on your content).