Server-side reporting

Capture crawlers and AI bots with POST /v1/report from your origin

The JS beacon only sees clients that run JavaScript, so it never observes pure crawlers — most importantly the declared AI agents (GPTBot, ClaudeBot, PerplexityBot, Bytespider) that fetch your HTML without running scripts. To capture them, report each request server-side from your origin: a Cloudflare Worker, edge middleware, or any backend.

This guide shows the request shape, the fire-and-forget pattern that adds zero latency, and a complete Cloudflare Worker example.

The endpoint

POST https://api.formshield.dev/v1/report with your publishable key and a JSON body of the visitor’s signals. FormShield classifies and scores the request server-side and stores the observation.

bash
curl -X POST https://api.formshield.dev/v1/report \
  -H "Authorization: Bearer fs_pub_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "ua": "Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)",
    "ip": "203.0.113.42",
    "hostname": "example.com",
    "path": "/pricing",
    "action": "pageview"
  }'

Response:

json
{ "ok": true, "request_id": "rpt_a1b2c3d4e5f6" }

Request body

ua string body

The visitor’s User-Agent. Drives user-agent classification, which names AI and search crawlers. Falls back to the request’s User-Agent header if omitted.

ip string body

The visitor’s IP address. Drives IP reputation (VPN, proxy, datacenter, scanner, country, ASN).

hostname string body

The host the visitor requested, e.g. example.com. Falls back to the Origin header.

path string body

The path the visitor requested, e.g. /pricing.

action string body default: pageview

A label for the hit, stored on the observation.

referrer string body

Optional. The visitor’s referrer.

Response

ok boolean

Always true on a 200. The report was accepted for scoring.

request_id string

An identifier for the stored observation, prefixed rpt_.

Make it fire-and-forget

Reporting must add zero latency to your response, and your site must never depend on FormShield being up. The rule is: send the report in the background and return your real response immediately.

On Cloudflare Workers, ctx.waitUntil(fetch(...)) keeps the worker invocation alive until the background fetch settles, while your response returns right away. Wrap the fetch in try/catch and swallow errors so a FormShield outage can never break your page.

ts
async function reportToFormShield(request: Request, url: URL): Promise<void> {
  try {
    await fetch("https://api.formshield.dev/v1/report", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${FORMSHIELD_KEY}`,
      },
      body: JSON.stringify({
        ua: request.headers.get("User-Agent") ?? undefined,
        ip: request.headers.get("CF-Connecting-IP") ?? undefined,
        hostname: url.hostname,
        path: url.pathname,
        action: "pageview",
      }),
    })
  } catch {
    // never let the page depend on FormShield being up
  }
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const url = new URL(request.url)
    ctx.waitUntil(reportToFormShield(request, url)) // fires after the response
    return handleRequest(request, env, ctx) // response returns immediately
  },
}

Store FORMSHIELD_KEY as a Worker secret (wrangler secret put FORMSHIELD_KEY) — it is your publishable key, but keeping it out of source is good hygiene.

Report from Node, Express, or Next.js

The pattern is identical on any backend: build the same body from the incoming request and do not await it in the request path. Fire the fetch, ignore the promise, and return your response.

ts
function reportToFormShield(req: { headers: Record<string, string | undefined>; hostname: string; path: string }) {
  // Do not await — let it run in the background.
  void fetch("https://api.formshield.dev/v1/report", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${process.env.FORMSHIELD_KEY}`,
    },
    body: JSON.stringify({
      ua: req.headers["user-agent"],
      ip: req.headers["x-forwarded-for"]?.split(",")[0]?.trim(),
      hostname: req.hostname,
      path: req.path,
      action: "pageview",
    }),
  }).catch(() => {
    // swallow — reporting must never break the request
  })
}

How it behaves

  • Scoring skips the missing-client penalty. A server-side request legitimately has no browser token, so its absence is expected, not a bot tell. The score rests on user-agent classification and IP reputation.
  • Crawlers are named and verified. A GPTBot or ClaudeBot request is classified, scored, and labeled with a bot:ai_crawler reason and the agent name. For operators that publish IP ranges (Google, Microsoft, OpenAI, DuckDuckGo), the report is also IP-verified — a real Googlebot is confirmed, a forged one from the wrong IP is flagged as spoofed. See bot detection.
  • Humans who run JS are counted twice. A real visitor who runs the beacon produces both a server report and a beacon observation. That is fine for crawler-heavy traffic; deduplication is on the roadmap. If you run both, server reporting is most valuable on routes where crawlers dominate.

Next steps

Type to search…

↑↓ navigate open esc close