Bot detection

How FormShield names, IP-verifies, and lets you allow or block crawlers and AI agents

FormShield identifies the bots that hit your site by their user agent, and for the operators that publish their IP ranges, it goes further: it verifies that the request actually came from that operator. A user agent is trivial to forge, so a forged “Googlebot” from an unrelated IP is flagged as spoofed and scored high — the opposite of a real one.

This page covers what FormShield identifies, the verified-versus-spoofed distinction, the fields and reasons it adds to each observation, and how to allow or block bots per project.

What FormShield identifies

Every pageview and server-reported request runs through a registry of known bots across three groups:

AI Crawlers

GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot, Amazonbot, and more.

Search Engines

Googlebot, Bingbot, DuckDuckBot, Applebot, YandexBot, Baiduspider, and others.

SEO Tools

AhrefsBot, SemrushBot, MJ12bot, DotBot, DataForSeoBot, Screaming Frog.

A matched bot adds its identity to the observation: a stable bot_id, the operator (Google, OpenAI, Anthropic, …), the category, and the group.

Verified vs spoofed

Some operators publish the IP ranges their crawlers run from. For those, FormShield checks that the request IP is actually inside the operator’s ranges and records a three-state result:

`verified`	Meaning
`true`	The user agent matches and the IP is in the operator’s published ranges. A genuine crawler.
`false`	Spoofed. The user agent claims a crawler whose operator publishes ranges, but the IP is not in them. The classic impersonation pattern — scored high.
`null`	Unverifiable: the bot’s operator publishes no ranges (the bot is named on user agent alone), or no IP was available.

Which bots are IP-verified

FormShield verifies against the operators that publish authoritative IP ranges:

Operator	Verified bots
Google	Googlebot and its family (Googlebot-Image, Storebot-Google, Google-InspectionTool)
Microsoft	Bingbot, BingPreview
OpenAI	GPTBot, ChatGPT-User, OAI-SearchBot
DuckDuckGo	DuckDuckBot

Every other crawler — ClaudeBot, PerplexityBot, Bytespider, Google-Extended, Applebot, CCBot, Amazonbot, the SEO tools, and the rest — is named on its user agent alone. These carry verified: null: FormShield can tell you the request claims to be that bot, but it can’t prove the IP. As more operators publish IP ranges, more bots move into the verified set.

What it adds to an observation

Bot detection adds fields to the observation’s metadata and reasons to its reasons array.

Field	Meaning
`bot_id`	The matched bot’s stable id, e.g. `googlebot`, `gptbot`. `null` when no bot matched.
`bot_operator`	The operating company, e.g. `Google`, `OpenAI`.
`bot_category`	`ai_crawler`, `search_crawler`, or `seo_tool`.
`bot_verified`	`true` (IP-verified), `false` (spoofed), or `null` (unverifiable / no IP).

Reason	Meaning
`bot:id:<id>`	A bot was identified, e.g. `bot:id:googlebot`.
`bot:verified`	The bot is IP-verified — user agent and IP agree.
`bot:spoofed:<id>`	The user agent claims this bot, but the IP is out of the operator’s ranges.
`bot:unverified`	The bot is named on user agent alone (no published ranges, or no IP).

These sit alongside the user-agent reasons (bot:ai_crawler, bot:search_crawler, and the agent name) described in pageview tracking.

How bots affect the score

A confirmed search crawler and a spoofed one land at opposite ends:

Verified search crawlers (Googlebot, Bingbot, DuckDuckBot) default to allow. Their bot user agent and datacenter IP are expected, not risk, so FormShield credits them back rather than penalizing them.
Verified AI crawlers (GPTBot, OAI-SearchBot, …) stay visible at review. They are benign but whether you serve them is a business decision — so FormShield surfaces them for you to allow or block per project rather than waving them through.
Spoofed crawlers are scored high — a spoofed user agent on its own is enough to reach block.

Allow or block bots per project

Open the project’s Settings → Bots in the dashboard. Each group (AI Crawlers, Search Engines, SEO Tools) and each individual bot has a three-state control:

Rule	Effect
Default	No override — the score decides.
Allow	The bot’s traffic is allowed past the score.
Block	The bot’s traffic is blocked.

A per-bot rule overrides its group’s rule, so you can block a whole group and allow one bot within it, or vice versa.

A shield marks bots that are IP-verifiable. An Allow rule only ever takes effect for those, and only when the request is genuinely verified:

So a practical setup for crawler-heavy traffic: leave verified search crawlers on Default (they already allow), and decide per AI crawler whether to Allow (you welcome it) or Block (you don’t want it training on your content).

Next steps

Pageview tracking

The client-side beacon and the observation it produces.

Server reporting

Capture and verify crawlers that never run JavaScript.

Edit this page