Concept · Regeneration loop

The regeneration loop

A failing evaluation is information, not a verdict. The regeneration loop turns "this response is biased" into "ask the model to fix the bias and try again." It's the difference between a guardrail that blocks (frustrating to the user) and a guardrail that repairs (invisible to the user, dramatic for the operator's eval pass-rate curve).

The loop, step by step

  1. Evaluate the candidate response against the chosen scorer set. Each dim returns a 0–1 score; overall = MIN.
  2. If overall ≥ threshold → stop, return passed.
  3. Otherwise → derive a structured improvement instruction from the failing dimensions (e.g. "the response is gender-biased; remove protected-class generalisations and re-write neutrally").
  4. Call the regenerator LLM with the original prompt + the original response + the improvement instruction. Get a new candidate.
  5. Re-evaluate. Loop back to step 2. Bounded by maxIterations (default 3, max 10).

Stop conditions

  • passed — overall score ≥ threshold. The "happy" exit.
  • max_iterations — iteration budget exhausted without passing. The route returns the highest-scoring iteration so far, so you get the best attempt even if no iteration cleared the bar.
  • no_improvement — consecutive iterations produced monotonically non-improving scores. We stop early rather than burn credits on a flat curve. Saves real money on inputs the model just can't fix.

Cost-budget gate

Every call accepts maxCostUsd. Before the loop starts, the route computes the worst-case cost (max iterations × per- iteration eval + regen tokens × per-token pricing for the chosen model) and refuses with HTTP 402 if it would exceed the cap. This protects against an operator misconfiguring the loop to runaway credit spend on a flood of failing inputs.

Default cap: $0.50 per call. Set to 0 to disable the gate for BYOK callers where you've already capped at the provider level.

The audit row

Every call writes one row to safe_regenerate_runs with:

  • The first 4,000 chars of the prompt and the original content (truncation, not censorship — full text lives in audit_logs for forensic recall).
  • The threshold, max-iterations, dimension subset, scorer set used.
  • The full iteration_scores[] array (one number per iteration), the chosen final_content_excerpt, the final_score, the stop_reason.
  • iteration_history JSONB — opt-in (set recordHistory: true on the request) — captures the per- iteration content + improvement instructions for "show me how the model improved this response" UIs.
  • The real cost: total_credits_consumed + estimated_cost_usd reconciled against the cost ledger.

Operators can answer "did the loop actually improve outcomes?" without running an offline replay. That's the audit-grade bar this table was designed for.

Real example

Live-fire test on 2026-05-24 sent a textbook gender-biased prompt to POST /api/v1/evals/safe-regenerate. The deep grader returned fairness=0.80 exactly at the threshold — barely passing — and the loop terminated at iteration 0 with stop_reason=passed. Result: 3 LLM calls, 3,937 input tokens, 709 output tokens, $0.001 real spend recorded.

Tighter calibration (see scoring thresholds) would have failed that response and triggered a regenerate; the loop is calibration-sensitive, which is the right tradeoff.

Related concepts