Concept · Firewall vs scorer
Firewall vs scorer
New users keep asking "do I use the firewall or the scorers?" The honest answer is both — they sit at different points in your request lifecycle, catch different attack classes, and have very different latency/cost profiles. This page is the latency-budget cheat sheet.
The firewall (sub-3ms, inline)
Where: in the request path, before the LLM call goes out, and on the LLM response before it returns to the caller.
How: regex + Bloom-filter + small ML classifier + 200-pattern DLP dictionary. No LLM call. Our p95 in production is 2.57ms (the firewall latency ratchet enforces this).
Catches: keyword-shaped attacks (PII patterns, known jailbreak prefixes, secret-shaped strings, profanity, prompt-injection signatures, DLP terms). High precision on these classes, low recall on nuanced attacks.
Returns: a verdict (allow / flag / block) plus a list of matched patterns. The verdict is binary; no per-dim score.
The scorers (LLM-judge, 200–800ms)
Where: out-of-band of the live request, OR inline if your latency budget allows. Typically called from POST /api/v1/evals/safe-regenerate, a CI pipeline, or a nightly batch eval job.
How: deep-grader prompts an LLM with the response + a per-criterion rubric. Returns a 0–1 score per dim + per-criterion verdicts + natural-language reasoning. See evaluation modes for the cost breakdown.
Catches: semantic failures the firewall can't pattern-match — "female candidates often need extra support" is grammatical English, not on any keyword list, but it's textbook gender bias.
Composition
The typical production wiring uses both:
request -> firewall (input) // 2.57ms p95
-> LLM call
-> firewall (response) // 2.57ms p95
-> [if traffic class === "high-risk"] -> scorers (out-of-band or inline)
-> clientMost traffic clears the firewall and returns to the client in under 5ms of guardrail overhead. The high-risk slice (regulated, PII- adjacent, agent tool calls) gets the additional scorer pass — either inline (200–800ms added latency) or async (zero added latency, with a delayed verdict that can revoke the response client-side or roll the user to a remediation flow).
When to put scorers inline
- Healthcare / legal / financial — the LLM-judge latency is acceptable because the cost of a bad response is high.
- Chargeback-able traffic — if the safe-regen loop will execute anyway (because the model might fail), the eval call is on the critical path either way.
- Compliance-as-evidence — DPDP/GDPR auditors want a per-response attestation, not a sampled one.
When to put scorers async
- High-volume customer support — eat the firewall latency for the live response, batch-eval responses overnight to refine prompts.
- Anonymous / public traffic — keep the live path fast, sample-eval for trend monitoring.
- Cost-sensitive — defer the LLM-judge spend to off-peak or skip on all but the riskiest 10% of traffic.
Detection vs eval — the same dims, different paths
Both paths report against the same 8 pillars (safety, fairness, accuracy, reliability, transparency, privacy, accountability, user- impact), but the firewall returns coarse verdicts and the scorers return 0–1 scores. A policy attached to either path can map both into the same downstream actions; see policy engine.
Related concepts
- Evaluation modes — cost/latency of basic vs deep scorers.
- Policy engine — mapping firewall verdicts + scorer scores to actions.
- Agent checkpoints — three checkpoints, each can use the firewall, the scorers, or both.