Concept · Regeneration loop
The regeneration loop
A failing evaluation is information, not a verdict. The regeneration loop turns "this response is biased" into "ask the model to fix the bias and try again." It's the difference between a guardrail that blocks (frustrating to the user) and a guardrail that repairs (invisible to the user, dramatic for the operator's eval pass-rate curve).
The loop, step by step
- Evaluate the candidate response against the chosen scorer set. Each dim returns a 0–1 score; overall = MIN.
- If overall ≥ threshold → stop, return passed.
- Otherwise → derive a structured improvement instruction from the failing dimensions (e.g. "the response is gender-biased; remove protected-class generalisations and re-write neutrally").
- Call the regenerator LLM with the original prompt + the original response + the improvement instruction. Get a new candidate.
- Re-evaluate. Loop back to step 2. Bounded by
maxIterations(default 3, max 10).
Stop conditions
passed— overall score ≥ threshold. The "happy" exit.max_iterations— iteration budget exhausted without passing. The route returns the highest-scoring iteration so far, so you get the best attempt even if no iteration cleared the bar.no_improvement— consecutive iterations produced monotonically non-improving scores. We stop early rather than burn credits on a flat curve. Saves real money on inputs the model just can't fix.
Cost-budget gate
Every call accepts maxCostUsd. Before the loop starts, the route computes the worst-case cost (max iterations × per- iteration eval + regen tokens × per-token pricing for the chosen model) and refuses with HTTP 402 if it would exceed the cap. This protects against an operator misconfiguring the loop to runaway credit spend on a flood of failing inputs.
Default cap: $0.50 per call. Set to 0 to disable the gate for BYOK callers where you've already capped at the provider level.
The audit row
Every call writes one row to safe_regenerate_runs with:
- The first 4,000 chars of the prompt and the original content (truncation, not censorship — full text lives in
audit_logsfor forensic recall). - The threshold, max-iterations, dimension subset, scorer set used.
- The full
iteration_scores[]array (one number per iteration), the chosenfinal_content_excerpt, thefinal_score, thestop_reason. iteration_history JSONB— opt-in (setrecordHistory: trueon the request) — captures the per- iteration content + improvement instructions for "show me how the model improved this response" UIs.- The real cost:
total_credits_consumed+estimated_cost_usdreconciled against the cost ledger.
Operators can answer "did the loop actually improve outcomes?" without running an offline replay. That's the audit-grade bar this table was designed for.
Real example
Live-fire test on 2026-05-24 sent a textbook gender-biased prompt to POST /api/v1/evals/safe-regenerate. The deep grader returned fairness=0.80 exactly at the threshold — barely passing — and the loop terminated at iteration 0 with stop_reason=passed. Result: 3 LLM calls, 3,937 input tokens, 709 output tokens, $0.001 real spend recorded.
Tighter calibration (see scoring thresholds) would have failed that response and triggered a regenerate; the loop is calibration-sensitive, which is the right tradeoff.
Related concepts
- Evaluation modes — what runs before the loop decides to retry.
- Scoring thresholds — what the threshold means.
- Policy engine — declarative alternative to the inline threshold + maxIterations flags.