URL: /products/content-protection --- title: Content Protection description: Stop web scraping with FormShield — a beacon plus edge model that fingerprints visitors and names scrapers and AI crawlers, including non-JS bots via server reporting --- Scrapers and AI crawlers fetch your pages all day, and the worst of them never run JavaScript — so client-only analytics and most bot tools never see them. You end up guessing which traffic is real and which is quietly draining your content into someone else's index or training set. Content Protection pairs a lightweight beacon with an edge model that fingerprints visitors and **names** the scrapers and AI crawlers hitting your content — including the ones that never run a single line of JavaScript. Every hit becomes a scored [observation](/introduction#key-concepts) with a `decision` and a list of `reasons` you read in the dashboard Logs. ## When to use it Reach for Content Protection when you want to know **who** is fetching your pages, not just how many requests came in. One async ` ``` On load it performs a signed handshake (`POST /v1/handshake`), then posts browser fingerprint and automation signals to `POST /v1/collect` on each pageview. Pure crawlers fetch your HTML without running scripts, so the beacon never sees them. Report each request from your origin worker or backend with `POST /v1/report`, passing the visitor's UA and IP. Fire it with `ctx.waitUntil` (fire-and-forget) so it adds zero latency and your page never depends on FormShield being up. See [server reporting](/guides/server-reporting) for the complete Worker example. The edge model scores every hit, classifies the user agent, and checks IP reputation. It names the bot (`bot_id` like `gptbot` or `googlebot`, plus the operating company) and, for operators that publish IP ranges — Google, Microsoft, OpenAI, DuckDuckGo — verifies the request really came from them. A forged Googlebot from the wrong IP is flagged `bot:spoofed`. View the score, decision, and reasons per observation in the dashboard Logs. ## Quickstart Once the beacon is on your pages, add server reporting to catch the crawlers it can't see. Report each origin request with the visitor's UA and IP. ```bash curl -X POST https://api.formshield.dev/v1/report \ -H "Authorization: Bearer fs_pub_live_…" \ -H "Content-Type: application/json" \ -d '{ "ua": "Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)", "ip": "203.0.113.42", "hostname": "example.com", "path": "/pricing", "action": "pageview" }' ``` Response: ```json { "ok": true, "request_id": "rpt_a1b2c3d4e5f6" } ``` Send the **visitor's** UA and IP from the incoming request — not your server's. On Cloudflare read the IP from `CF-Connecting-IP` and the UA from the `User-Agent` header of the request your origin received. `/v1/report` returns an acknowledgement, not a verdict. Scoring happens server-side and the score, decision, and reasons land on the observation — read them in Logs. Never gate your response on this call. ## Endpoints The beacon and server reporting use one publishable key (`fs_pub_live_…`), safe to expose in the browser. | Endpoint | Caller | Purpose | | --- | --- | --- | | `GET /js/formshield.js` | browser | The beacon. Auto-initializes from `data-fs-*` attributes. | | `POST /v1/handshake` | beacon | Signed handshake that proves a real browser ran the beacon. | | `POST /v1/collect` | beacon | Posts fingerprint and automation signals on each pageview. | | `POST /v1/report` | your origin | Server-side report of a request the beacon can't see. | ## Signals Each observation is scored from these signals. They combine user-agent classification with self-hosted IP intelligence. Declared agents get a `bot:ai_crawler` or `bot:search_crawler` reason plus the named operator. GPTBot, ClaudeBot, PerplexityBot, and Bytespider are recognized; verified benign search crawlers are credited toward `allow` while AI crawlers stay visible. For operators that publish their ranges, a request whose UA claims a crawler but whose IP is out of range is flagged `bot:spoofed` and scored high. A real crawler is confirmed (`bot:verified`); a forged one is caught. The signed handshake token proves a real browser ran the beacon. Its absence (`client_token_missing`) plus `webdriver` and headless markers (`automation_detected`) push the score toward `block` on the client path. Server reports correctly skip the missing-client penalty. Datacenter, VPN, proxy, residential-proxy, and scanner flags plus country and ASN. A human UA from a datacenter range, or a desktop UA on a mobile IP, raises a consistency flag. ### Key reasons The user agent declares an AI crawler (GPTBot, ClaudeBot, PerplexityBot, Bytespider). The operator is named on the observation. AI crawlers stay visible rather than being silently allowed. The user agent declares a search crawler (Googlebot, Bingbot). A verified benign search crawler is credited toward `allow`. The UA claims a crawler whose operator publishes IP ranges, and the request's IP falls inside them. A real crawler is confirmed. The UA claims a crawler but the IP is out of the operator's published range. The hit is scored high — a forged Googlebot is caught. Browser fingerprint flags `webdriver` or headless automation — a strong tell that pushes a client-path hit toward `block`. The signed handshake token is absent on a client-path hit, so no real browser ran the beacon. Server reports skip this penalty by design. ## Common questions The JS beacon only sees clients that execute scripts, so pure crawlers slip past it. Report each request from your origin with `POST /v1/report`, passing the visitor's UA and IP from the incoming request (on Cloudflare, `CF-Connecting-IP` and `User-Agent`). FormShield classifies and scores it server-side, naming AI agents like GPTBot and ClaudeBot and verifying or flagging crawlers by IP range. Send it fire-and-forget with `ctx.waitUntil` so it adds zero latency. No. `/v1/report` returns only `{ ok: true, request_id }`; scoring happens server-side and the score, decision, and reasons are stored on the observation. View them in the dashboard Logs. Never gate your response on this call, and always wrap the fetch in `try/catch` so a FormShield outage can never break your page. The passive beacon (`handshake` / `collect` / `report`) is **free** — it costs zero credits, so you can instrument every pageview. Deep analysis costs 4 credits per request. Billing is in credits across all products; see [Billing & Pricing](/guides/billing) for plans and overage. ## Next steps The full `/v1/report` reference — request body, fire-and-forget patterns, and a complete Cloudflare Worker. Name and IP-verify crawlers, and allow or block them per project. Every beacon attribute, the three modes, and the full observation shape. Get scored pageviews flowing into your dashboard in minutes.