Concept · Firewall vs scorer

Firewall vs scorer

New users keep asking "do I use the firewall or the scorers?" The honest answer is both — they sit at different points in your request lifecycle, catch different attack classes, and have very different latency/cost profiles. This page is the latency-budget cheat sheet.

The firewall (sub-3ms, inline)

Where: in the request path, before the LLM call goes out, and on the LLM response before it returns to the caller.

How: regex + Bloom-filter + small ML classifier + 200-pattern DLP dictionary. No LLM call. Our p95 in production is 2.57ms (the firewall latency ratchet enforces this).

Catches: keyword-shaped attacks (PII patterns, known jailbreak prefixes, secret-shaped strings, profanity, prompt-injection signatures, DLP terms). High precision on these classes, low recall on nuanced attacks.

Returns: a verdict (allow / flag / block) plus a list of matched patterns. The verdict is binary; no per-dim score.

The scorers (LLM-judge, 200–800ms)

Where: out-of-band of the live request, OR inline if your latency budget allows. Typically called from POST /api/v1/evals/safe-regenerate, a CI pipeline, or a nightly batch eval job.

How: deep-grader prompts an LLM with the response + a per-criterion rubric. Returns a 0–1 score per dim + per-criterion verdicts + natural-language reasoning. See evaluation modes for the cost breakdown.

Catches: semantic failures the firewall can't pattern-match — "female candidates often need extra support" is grammatical English, not on any keyword list, but it's textbook gender bias.

Composition

The typical production wiring uses both:

request -> firewall (input)          // 2.57ms p95
        -> LLM call
        -> firewall (response)        // 2.57ms p95
        -> [if traffic class === "high-risk"] -> scorers (out-of-band or inline)
        -> client

Most traffic clears the firewall and returns to the client in under 5ms of guardrail overhead. The high-risk slice (regulated, PII- adjacent, agent tool calls) gets the additional scorer pass — either inline (200–800ms added latency) or async (zero added latency, with a delayed verdict that can revoke the response client-side or roll the user to a remediation flow).

When to put scorers inline

Healthcare / legal / financial — the LLM-judge latency is acceptable because the cost of a bad response is high.
Chargeback-able traffic — if the safe-regen loop will execute anyway (because the model might fail), the eval call is on the critical path either way.
Compliance-as-evidence — DPDP/GDPR auditors want a per-response attestation, not a sampled one.

When to put scorers async

High-volume customer support — eat the firewall latency for the live response, batch-eval responses overnight to refine prompts.
Anonymous / public traffic — keep the live path fast, sample-eval for trend monitoring.
Cost-sensitive — defer the LLM-judge spend to off-peak or skip on all but the riskiest 10% of traffic.

Detection vs eval — the same dims, different paths

Both paths report against the same 8 pillars (safety, fairness, accuracy, reliability, transparency, privacy, accountability, user- impact), but the firewall returns coarse verdicts and the scorers return 0–1 scores. A policy attached to either path can map both into the same downstream actions; see policy engine.

Related concepts

Evaluation modes — cost/latency of basic vs deep scorers.
Policy engine — mapping firewall verdicts + scorer scores to actions.
Agent checkpoints — three checkpoints, each can use the firewall, the scorers, or both.