URL: /products/content-protection

---
title: Content Protection
description: Stop web scraping with FormShield — a beacon plus edge model that fingerprints visitors and names scrapers and AI crawlers, including non-JS bots via server reporting
---

Scrapers and AI crawlers fetch your pages all day, and the worst of them never run JavaScript — so client-only analytics and most bot tools never see them. You end up guessing which traffic is real and which is quietly draining your content into someone else's index or training set.

Content Protection pairs a lightweight beacon with an edge model that fingerprints visitors and **names** the scrapers and AI crawlers hitting your content — including the ones that never run a single line of JavaScript. Every hit becomes a scored [observation](/introduction#key-concepts) with a `decision` and a list of `reasons` you read in the dashboard Logs.

## When to use it

Reach for Content Protection when you want to know **who** is fetching your pages, not just how many requests came in.

<CardGroup cols={2}>
  <Card title="Beacon (in the browser)" icon="code">
    One async `<script>` tag fingerprints each visitor that runs JavaScript and posts the signals on every pageview. Fastest start, but blind to pure crawlers.
  </Card>
  <Card title="Server reporting (from your origin)" icon="server">
    `POST /v1/report` from a Cloudflare Worker, edge middleware, or any backend captures the declared AI agents — GPTBot, ClaudeBot, PerplexityBot, Bytespider — that fetch HTML without running scripts.
  </Card>
</CardGroup>

The beacon and server reporting share one publishable key and feed the same observation stream. Crawler-heavy content sites run both: the beacon scores real humans, server reporting catches everything that skips JavaScript.

## How it works

<Steps>
  <Step title="Drop the beacon on your pages">
    Add one async `<script>` tag pointing at `https://api.formshield.dev/js/formshield.js` with your publishable key and `data-fs-mode="pageload"`. It auto-initializes from the `data-fs-*` attributes — no extra code.

    ```html
    <script
      async
      src="https://api.formshield.dev/js/formshield.js"
      data-fs-project-key="fs_pub_live_…"
      data-fs-action="pageview"
      data-fs-mode="pageload"
    ></script>
    ```

    On load it performs a signed handshake (`POST /v1/handshake`), then posts browser fingerprint and automation signals to `POST /v1/collect` on each pageview.
  </Step>

  <Step title="Catch the non-JS crawlers server-side">
    Pure crawlers fetch your HTML without running scripts, so the beacon never sees them. Report each request from your origin worker or backend with `POST /v1/report`, passing the visitor's UA and IP.

    Fire it with `ctx.waitUntil` (fire-and-forget) so it adds zero latency and your page never depends on FormShield being up. See [server reporting](/guides/server-reporting) for the complete Worker example.
  </Step>

  <Step title="Read named, verified verdicts in your logs">
    The edge model scores every hit, classifies the user agent, and checks IP reputation. It names the bot (`bot_id` like `gptbot` or `googlebot`, plus the operating company) and, for operators that publish IP ranges — Google, Microsoft, OpenAI, DuckDuckGo — verifies the request really came from them.

    A forged Googlebot from the wrong IP is flagged `bot:spoofed`. View the score, decision, and reasons per observation in the dashboard Logs.
  </Step>
</Steps>

## Quickstart

Once the beacon is on your pages, add server reporting to catch the crawlers it can't see. Report each origin request with the visitor's UA and IP.

```bash
curl -X POST https://api.formshield.dev/v1/report \
  -H "Authorization: Bearer fs_pub_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "ua": "Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)",
    "ip": "203.0.113.42",
    "hostname": "example.com",
    "path": "/pricing",
    "action": "pageview"
  }'
```

Response:

```json
{ "ok": true, "request_id": "rpt_a1b2c3d4e5f6" }
```

<Warning>
  Send the **visitor's** UA and IP from the incoming request — not your server's. On Cloudflare read the IP from `CF-Connecting-IP` and the UA from the `User-Agent` header of the request your origin received.
</Warning>

`/v1/report` returns an acknowledgement, not a verdict. Scoring happens server-side and the score, decision, and reasons land on the observation — read them in Logs. Never gate your response on this call.

## Endpoints

The beacon and server reporting use one publishable key (`fs_pub_live_…`), safe to expose in the browser.

| Endpoint | Caller | Purpose |
| --- | --- | --- |
| `GET /js/formshield.js` | browser | The beacon. Auto-initializes from `data-fs-*` attributes. |
| `POST /v1/handshake` | beacon | Signed handshake that proves a real browser ran the beacon. |
| `POST /v1/collect` | beacon | Posts fingerprint and automation signals on each pageview. |
| `POST /v1/report` | your origin | Server-side report of a request the beacon can't see. |

## Signals

Each observation is scored from these signals. They combine user-agent classification with self-hosted IP intelligence.

<CardGroup cols={2}>
  <Card title="AI and search crawler identification" icon="bot">
    Declared agents get a `bot:ai_crawler` or `bot:search_crawler` reason plus the named operator. GPTBot, ClaudeBot, PerplexityBot, and Bytespider are recognized; verified benign search crawlers are credited toward `allow` while AI crawlers stay visible.
  </Card>
  <Card title="Spoof detection via IP verification" icon="shield">
    For operators that publish their ranges, a request whose UA claims a crawler but whose IP is out of range is flagged `bot:spoofed` and scored high. A real crawler is confirmed (`bot:verified`); a forged one is caught.
  </Card>
  <Card title="Automation and missing-token tells" icon="bug">
    The signed handshake token proves a real browser ran the beacon. Its absence (`client_token_missing`) plus `webdriver` and headless markers (`automation_detected`) push the score toward `block` on the client path. Server reports correctly skip the missing-client penalty.
  </Card>
  <Card title="IP reputation on every hit" icon="globe">
    Datacenter, VPN, proxy, residential-proxy, and scanner flags plus country and ASN. A human UA from a datacenter range, or a desktop UA on a mobile IP, raises a consistency flag.
  </Card>
</CardGroup>

### Key reasons

<ParamField path="bot:ai_crawler" type="reason">
  The user agent declares an AI crawler (GPTBot, ClaudeBot, PerplexityBot, Bytespider). The operator is named on the observation. AI crawlers stay visible rather than being silently allowed.
</ParamField>

<ParamField path="bot:search_crawler" type="reason">
  The user agent declares a search crawler (Googlebot, Bingbot). A verified benign search crawler is credited toward `allow`.
</ParamField>

<ParamField path="bot:verified" type="reason">
  The UA claims a crawler whose operator publishes IP ranges, and the request's IP falls inside them. A real crawler is confirmed.
</ParamField>

<ParamField path="bot:spoofed" type="reason">
  The UA claims a crawler but the IP is out of the operator's published range. The hit is scored high — a forged Googlebot is caught.
</ParamField>

<ParamField path="automation_detected" type="reason">
  Browser fingerprint flags `webdriver` or headless automation — a strong tell that pushes a client-path hit toward `block`.
</ParamField>

<ParamField path="client_token_missing" type="reason">
  The signed handshake token is absent on a client-path hit, so no real browser ran the beacon. Server reports skip this penalty by design.
</ParamField>

## Common questions

<ParamField path="How do I stop web scraping when the scraper does not run JavaScript?" type="question">
  The JS beacon only sees clients that execute scripts, so pure crawlers slip past it. Report each request from your origin with `POST /v1/report`, passing the visitor's UA and IP from the incoming request (on Cloudflare, `CF-Connecting-IP` and `User-Agent`). FormShield classifies and scores it server-side, naming AI agents like GPTBot and ClaudeBot and verifying or flagging crawlers by IP range. Send it fire-and-forget with `ctx.waitUntil` so it adds zero latency.
</ParamField>

<ParamField path="Does /v1/report return a block decision I can act on inline?" type="question">
  No. `/v1/report` returns only `{ ok: true, request_id }`; scoring happens server-side and the score, decision, and reasons are stored on the observation. View them in the dashboard Logs. Never gate your response on this call, and always wrap the fetch in `try/catch` so a FormShield outage can never break your page.
</ParamField>

<ParamField path="What does Content Protection cost?" type="question">
  The passive beacon (`handshake` / `collect` / `report`) is **free** — it costs zero credits, so you can instrument every pageview. Deep analysis costs 4 credits per request. Billing is in credits across all products; see [Billing & Pricing](/guides/billing) for plans and overage.
</ParamField>

## Next steps

<CardGroup cols={2}>
  <Card title="Server reporting" icon="server" href="/guides/server-reporting">
    The full `/v1/report` reference — request body, fire-and-forget patterns, and a complete Cloudflare Worker.
  </Card>
  <Card title="Bot detection" icon="bot" href="/guides/bot-detection">
    Name and IP-verify crawlers, and allow or block them per project.
  </Card>
  <Card title="Pageview tracking" icon="chart-line" href="/guides/pageview-tracking">
    Every beacon attribute, the three modes, and the full observation shape.
  </Card>
  <Card title="Quickstart" icon="rocket" href="/quickstart">
    Get scored pageviews flowing into your dashboard in minutes.
  </Card>
</CardGroup>
