What's new

Shipping every week.

New features, improvements, and fixes shipped to EvalGuard. Read about what we've built recently.

v1.1.0March 2026

NL Pipeline & Adaptive Red Teaming

Two industry-first features that no competitor has. Describe your app in plain English to generate a complete eval suite, and let an AI attacker adapt in real-time to find vulnerabilities static tests miss.

NL→Eval Pipeline

Describe your AI app in natural language. EvalGuard's proprietary pipeline analyzes your app profile, maps domain-specific risks, generates targeted test cases, and assembles a production-ready evaluation config — powered by multi-model orchestration across 89 providers

Adaptive Multi-Turn Red Teaming

AI-powered attacker that adapts in real-time using UCB1 bandit algorithm. Runs parallel sessions across 43 strategies × 14 categories, learns from each response, and builds a complete resistance profile

25,000+ test blocks across 457 files

Comprehensive test coverage across all 6 products with end-to-end, integration, and unit tests ensuring production reliability

10 Security Audit Fixes

Hardened authentication, authorization, input validation, and API security based on comprehensive security audit findings

Added

NL→Eval Pipeline — describe your AI app in plain English, get a complete evaluation suite in seconds
Adaptive Multi-Turn Red Teaming with UCB1 bandit optimization and parallel attack sessions
Swagger API Documentation covering all 307 API endpoints
Cross-session memory for red teaming attack strategies
Real-time resistance profiling dashboard

Improved

Test suite expanded to 25,000+ describe/it blocks across 457 test files
Red teaming now supports up to 15 conversation turns per session
43 attack strategies × 14 vulnerability categories coverage
89 LLM provider support with intelligent orchestration for NL pipeline

Security

10 security audit fixes across authentication and authorization
Hardened input validation on all API endpoints
Improved API key scoping and permission enforcement
Enhanced CSRF and rate limiting protections

v1.0.0March 2026

Launch Release

The most comprehensive AI evaluation, security, and governance platform. Six products, one platform, zero blind spots.

138 Evaluation Scorers

Accuracy, faithfulness, hallucination, bias, toxicity, coherence, and more — the most comprehensive scorer library available

249 Security Attack Plugins

Prompt injection, jailbreaking, PII extraction, data exfiltration, and 245 more adversarial test types

43 Attack Strategies

Multi-turn, crescendo, tree-of-attacks, semantic variations, and more sophisticated red teaming strategies

89 LLM Providers

OpenAI, Anthropic, Google, AWS Bedrock, Azure, Mistral, Cohere, and 80 more — all through a unified API

Added

Eval Engine — Run evaluations with 166 scorers across accuracy, safety, bias, compliance, and custom metrics
LLM Gateway — Centralized AI traffic management with policy enforcement, rate limiting, and automatic failover
FinOps Dashboard — Real-time cost tracking, budget alerts, and optimization recommendations across all providers
Observability Platform — Production monitoring with real-time dashboards, alerting, and distributed tracing
Prompt IDE — Version-controlled prompt engineering with A/B testing, diff views, and deployment pipelines
Red Teaming-as-a-Service — Automated adversarial testing with 249 attack plugins and 43 strategies
EU AI Act auto-risk classification with compliance dashboard and evidence collection
ISO 42001 and SOC 2 readiness tracking with automated controls mapping
Scheduled continuous evaluations with cron-based automation
Enterprise SSO/SAML integration for single sign-on
Feature flags system for gradual rollouts and A/B testing
Webhook delivery with HMAC-SHA256 signing and automatic retries
Full API and SDK support (TypeScript, Python) for CI/CD integration
Self-hostable via Docker Compose and Helm charts
Annotation workflows with human-in-the-loop evaluation queues
Dataset versioning with diff tracking and lineage
Benchmark suites for standardized model comparison
Audit logging for compliance and security monitoring

Security

Row-Level Security (RLS) isolation across all database tables
AES-256 encryption at rest for all sensitive data including API keys
TLS 1.2+ encryption for all data in transit
CSRF protection on all state-changing endpoints
Rate limiting with configurable per-endpoint thresholds
API key scoping with granular permission controls

Engineering changelog

Every commit, grouped by week and conventional-commit type. Auto-generated from git on every release. 1,240 changes across 11 weeks.

2026-W21

May 18 – May 24, 2026

70 changes

Features

auditPhase E — P0/P1/P2 hardening (e2e creds + CLI dry-run + lib hardening)#358
auditPhase D — depth fixes (policy keys, worker race, Terraform/Java/Semantic, trace-id)#357
auditPhase C — distribution wedge (GitHub App + CLI + pricing DB + scan-model)#356
auditPhase B — namesake clearance, 8 stub→real conversions#355
auditPhase A — P0 cross-tenant, SSRF, postgrest-safe, idempotency hardening#354
complianceship compliance_evidence persistence — checklist toggles now persist#328
Q2 deferred items — RLS coverage + auto-OpenAPI/registry docs pipeline (1078 new pages)#365
q1+q2bundle — 11 items, ~6,500 LOC, 400+ tests#364
evalT2 — DAG metrics + Arena + trace-span assertions + tool-call F1/trajectory (108 tests)#363
gradersT1.1 — 10 deep graders w/ DeepEval-quality rubrics + 271 tests#362
loginpremium B2B redesign — trust signals, real capabilities, visible SSO option#361
finopsship Chargeback — wire UI to /api/v1/chargeback#326
componentsfinal natural-search hover state cleanup#316
componentsresidual token cleanup batch 3 — final 4 components#315
componentsresidual token cleanup batch 2 — 11 more components#314
componentsresidual token cleanup — chart-context-menu, command-palette, insights-feed, keyboard-shortcuts#313
componentsbulk-migrate 24 shared components to design tokens#312
marketingmigrate /docs/api to design tokens#311
dashboardfinal residual sweep — border-gray-100/dark:border-gray-900#310
dashboardexpand token-migration script (single-tone) + sweep 36 more pages#309
dashboardbulk-migrate residual subpages (settings/traces/clusters/import)#308
dashboardbulk-migrate 9 nested security/workflow/prompts subpages to design tokens#307
dashboardbulk-migrate 9 nested subpages to design tokens#306
dashboardbulk-migrate /support /test-gen /threat-intelligence /uba /webhooks /workflow#305
dashboardbulk-migrate /online-evals /saved-searches /service-map /simulator to design tokens#304
dashboardbulk-migrate /api-docs /changes /compare /events /executive to design tokens#303
dashboardmigrate /dlp + /data-discovery + /data-residency to design tokens#302
dashboardmigrate /templates + /agents + /mcp-traffic to design tokens#301
dashboardmigrate /benchmarks + /builder + /generate to design tokens#300

Fixes

ciunblock deploy — add Synthesizer to openapi.json + fix soc2 webhook test flake#372
cialso drop refs/pull/* + reflog before gitleaks; allowlist known-fake AKIA commit#371
ciprune stale git refs before gitleaks scan (self-hosted runner cache)#369
docsdisambiguate SDK reference routes — move [version] under literal v/ prefix#368
gradershybrid biasGrader/piiGrader/hallucinationGrader — restore regex floor + deep judge on LLM-available path#367
migrationmake active_sessions policies idempotent in 20260520_complete_rls_coverage#366
marketingdrop doubled "| EvalGuard" in pricing/models + products/model-scan titles#360
middlewareunblock Phase C marketing routes from auth gate#359
notificationsdrop console.warn from retired email channel#353
build/api/v1/status/uptime force-dynamic — unblocks deploy#350
testannotations + 3rd upload test file — unblocks revert deploy#349
testupload-validation.test.ts — projectId + middleware mocks + 30s timeout#348
datasets/uploadrequire projectId — closes baseline #1 of extractor audit#346
annotationsPOST schema requires projectId — middleware cross-tenant check now actually runs#341
cicorrect broken actions/upload-artifact@v4.6.2 SHA pin#325
spa-navuse router.push in prompts/ab-testing (was window.location.href)#324
react-hooksuse next/navigation router in judge-models (was window.location.href)#323
react-hooksinline handleCopySnippet — call buildJudgeSnippet directly#322
react-hookshoist buildJudgeSnippet() out of judge-models render#321
react-hookshoist pure toRad() helper out of finops donut renderer#320
react-hooksclear all static-components warnings (8 → 0)#319
countsupdate stale provider/scorer/plugin badges to canonical numbers#318

Docs

archclarify compliance_evidence + allocation_rules + RLS scope#333
openapidocument /api/v1/compliance/checklist GET + PATCH#329

extractor-auditescalate .optional() id + requiredRole to violation#345
extractor-auditalways show warnings + triage guidance#344
extractor-auditwarn on .optional() extractor fields (latent #341-class bypass)#343
add extractor-schema-audit ratchet (prevents class of #341 silent-bypass bugs)#342

Chore

securityretire email/send + 503 gateway PUT — clears extractor-audit baseline#352
flagsenable 3 stale name-sake flags — backends already shipped#327
ESLint --fix sweep — 24 auto-fixable warnings (mostly unused imports)#317

Tests

complianceadd cross-tenant rejection tests for checklist API#330
annotationscross-tenant rejection on GET + flag POST extractor gap#340
tracescross-tenant rejection on GET — traceStore never queried#339
audit-logscross-tenant rejection — admin client never instantiated on 403#338
api-keyscross-tenant rejection — highest-blast-radius surface#337
agent-runscross-tenant rejection — agent_runs SELECT suppressed on 403#336
prompts/ab-experimentscross-tenant + RBAC rejection sweep#335
marketplacecross-tenant rejection sweep for install API#334
chargeback-exportcross-tenant rejection tests for CSV export#332
chargebackadd cross-tenant + admin-RBAC rejection tests#331

2026-W20

May 11 – May 17, 2026

123 changes

Features

dashboardmigrate /fine-tuning + /simulation to design tokens#299
dashboardmigrate /marketplace + /finops to design tokens#298
dashboardmigrate /prompts + /annotations + /embeddings to design tokens#297
dashboardmigrate /integrations + /team + /datasets to design tokens#296
dashboardmigrate /gateway + /firewall + /compliance to design tokens#295
dashboardmigrate /settings + /playground + /cost to design tokens#294
dashboardrebuild /traces + /monitoring in Linear restraint#293
dashboardrebuild /evals + /security in Linear restraint#292
dashboardrebuild /dashboard home in Linear restraint#291
marketingrebuild /trust + /engineering in Linear restraint#290
marketingrebuild /about /contact /security /changelog in Linear restraint#289
marketingrebuild /docs hub + shell in Linear restraint#286
dashboardrebuild sidebar + topbar + mobile nav in Linear restraint#287
marketingrebuild /compare + /alternatives hubs in Linear restraint#285
marketingrebuild /pricing in Linear-restraint pattern#284
testcomprehensive prod-monitoring + test orchestrator + auth bot#277
zodbodySchema on 5 new MCP routes (restore ratchet 142 → 135)#223
evalper-provider grader scheduler — rate-limit-aware sequencing (W7)#114
mcpsub-10ms semantic tool-filter library (W7)#112
mcptransport bridges — HTTP / SSE / WebSocket (W5-6 #71 final piece)#111
mcpRedis-backed ToolRateLimiter for multi-pod deployments (W5-6 #71 follow-up)#108
mcpmanual health-check endpoint + 'Test connection' UI button (W5-6 #71 follow-up)#107
mcpserver health-check cron (W5-6 #71 follow-up)#106
mcpregistry + permissions UI pages + permission PUT/DELETE routes (W5-6 #71 PR D)#105
mcpruntime per-tool RBAC enforcement + audit-per-invocation (W5-6 #71 PR C)#104
mcpOAuth 2.1 + JWT validation per RFC 9068 (W5-6 #71 PR B)#103
mcpserver registry + per-tool RBAC schema + CRUD (W5-6 #71 PR A)#102
zodreal bodySchema on prompts cluster (2 routes)#206
zodreal bodySchema on guardrails + safety cluster (4 routes)#214
zodbodySchema on gpu-monitoring + feedback/token#219
zodreal bodySchema on incidents + integrations + insights (4 handlers)#215
zodreal bodySchema on custom-dashboards cluster (4 routes)#210
zodreal bodySchema on webhooks + sso (2 routes)#209
zodreal bodySchema on traces cluster (2 routes)#208
zodreal bodySchema on security cluster (3 routes)#207
zodreal bodySchema on firewall cluster (4 routes)#200
zodreal bodySchema on evals cluster (5 routes) + api-handler hardening#199
rls-ratchetlock SOFT-violation baseline — block policy-theater regression#170
security/benchmarksfinish VLSU + wire 5 new datasets into BENCHMARKS#174
importcURL + Postman v2.1 → provider config parsers (W7)#115
zodreal bodySchema on eval-ops cluster (4 routes)#205
zodreal bodySchema on embeddings + exports cluster (4 routes)#204
zodreal bodySchema on agents cluster (3 routes)#202
zodreal bodySchema on gateway cluster (5 routes)#201
zodreal bodySchema on LLM input cluster #2 (3 routes)#203
zodreal bodySchema on data + identity cluster (6 routes)#197
zodreal bodySchema on data governance cluster (7 routes)#198
zodreal bodySchema on compliance cluster (7 routes)#194
zodreal bodySchema on AI/LLM input cluster (5 routes)#193
zodreal bodySchema on annotations cluster (3 routes, 4 handlers)#196
zodreal bodySchema validation on 5 money-flow routes#192
scimDB-backed per-org bearer-token rotation (closes P2.2 SCIM half)#176
cli/init--ci flag scaffolds .env.example + GitHub Actions workflow#172

Fixes

authrestore a11y attributes lost in PR #283 auth-page refresh#288
rlsreal fixes for 5 WEAK/CRITICAL tables surfaced by per-policy audit#188
rlsper-route audit of 31 SOFT-violation tables — 4 new policies + 27 annotations#187
apirelocate attachment-mime helpers out of route.ts (Next.js build)0583a5c
test,libunblock deploy CI — 6 fixes for post-merge test debt + 1 real bugcd95f43
dbmove audit_logs index to CONCURRENTLY migration — unblock deploy ratchetc371c6e
apiOpenAPI stubs for 6 routes added in PR #278 — unblock deploy ratchetefbf580
helmadd missing evalguard.labels + selectorLabels helpers#276
depssync pnpm-lock.yaml after override removal in #273#275
eslintremove ts-eslint v8 override + restore v7 ban-types compat#273
webmigrate eslint.config.mjs off FlatCompat → native flat config#270
webinstall @vitejs/plugin-react + unskip 4 sentry GlobalError tests#269
lintrestore real lint on llamaindex-wrapper + vercel-ai-wrapper#268
dockerset runtime NODE_OPTIONS=--max-old-space-size=2048#267
rls-isolationeval_results column is 'scorer', not 'scorer_name'#265
rls-isolationprovide eval_runs.created_by (NOT NULL)#264
rls-isolationNULLIF empty JWT claim + remove project_id from shared_traces#263
rls-isolation9 per-test-file setup bugs + expectThrow runner support#262
rls-isolationplant empty JWT claim for anon role#261
rls-isolationgrant Supabase-equivalent default privileges post-migrations#260
rls-isolationinstall is_project_member(uuid) placeholder BEFORE migrations#259
rls-isolationset_config() instead of SET LOCAL $1 + is_project_member(uuid) shim#258
rls-isolationper-statement migration apply + auth.role() stub#257
ratchets, rls-isolationbump skip baseline 191→192 + strip CONCURRENTLY#254
ratchetsexpand RLS-isolation drop list + kill 2 MCP 'as any' casts#252
rls-isolationdrop fixture stub tables so 00000 schema runs cleanly#251
webvitest 4.x constructor mocks + UUID test payloads + sentry skip#250
mcp-gatewayuse process.stderr in audit, restore console-count floor#248
corereset mockFetch between tests in notification-integrations#247
workervitest 4.x constructor mocks — use function (not arrow)#246
llamaindex-wrapperTS 6.x compat — node + DOM types, Mock<T> typing#245
vercel-ai-wrapperadd @types/node + DOM lib + types: [node]#243
testSC-20 startup-observability-baseline — deterministic baseline#177
auth-requiredhandle trailing comma in createApiHandler options#195
rls-audithandle quoted policy names — SOFT count 48→11 (regex missing 37 real policies)#179
migration-testseed idempotency — ON CONFLICT DO NOTHING#180
cicover idempotency + SOC2 branches; fix migration setup + auth-required parser#160
post-marathon-ciresolve all 4 CI failures from #158 — name collision + 2 test issues#159

Docs

openapiadd 3 missing MCP routes — invoke, permissions/{id}, health-check#256
mcp-authclarify issuer vs verifier roles of the two auth paths#225
scimSCIM 2.0 provisioning guide — Okta / Azure AD / Google Workspace#178
founderaction items 2026-05-12 — six items only founder can execute#173

Build

depsbump actions/stale from 9 to 10#24

securityadd actions:read to security-scan.yml permissions#181

Chore

auditrls-audit `service-role-only` verb + annotate 6 zero-consumer tables#189
auditrls-audit annotation so dynamic CREATE POLICY blocks are visible to the ratchet#184
depsbump the production-dependencies group across 1 directory with 39 updates#249
depsbump pnpm/action-setup from 4 to 6#235
depsbump actions/setup-go from 5 to 6#236
depsbump azure/setup-helm from 4 to 5#233
depsbump changesets/action from 1.4.6 to 1.8.0#234
testsdelete 15 SUPERSEDED it.skip blocks (dead test code) [skip deploy]#274
e2ebump e2e-nightly cron from weekly Sunday → actually nightly [skip deploy]#272
delete orphan apps/web/apps/web/ build artifact tree [skip deploy]#271
deploypaths-ignore CI-only test infra (RLS isolation + ratchet baselines)#266
deps-devbump vitest in the development-dependencies group#239
depsbump actions/download-artifact from 4 to 8#232
auth-requiredre-baseline ratchet 195 → 192 (-3)#222
deps-devbump the development-dependencies group across 1 directory with 19 updates#220
depsbump actions/setup-python from 5 to 6#23
depsbump softprops/action-gh-release from 2 to 3#21
depsbump actions/checkout from 4 to 6#20
zod-requiredre-baseline ratchet 311 → 135 (-176)#221
husky/pre-pushtee test output to a log file for flake diagnosis#175
auth-requireddocument admin-route auth intent — baseline 313→302#171
2026-05-11 marathoncross-tenant 76→0, test debt 251→0, +8 real route bugs#158

Tests

rlsPostgres-test-container RLS isolation framework + tests for the 4 new policies#190
api-handler, audit-loggerrestore critical-path coverage to baseline#253
mcpPlaywright e2e for registry + permissions UI (W5-6 #71 follow-up)#110
api-handlerrestore branch coverage after R2 idempotency block (89.5% → 92.3%)#161

2026-W19

May 4 – May 10, 2026

216 changes

Features

complianceOWASP Agentic AI Top 10 (2025) framework#140
securitySBOM workflow + security.txt + RFC 9116 ratchet4fd740e
providersCursor + Windsurf adapters (W3 #68)#96
actionline-level PR review comments + evalguard code-scan CLI (W1 #64)#92
terraformclose path-to-20+, ship the deferred 7 resources (PR I)#109
evil-mcpadversarial MCP target server (W5 #72)#99
cliwire YAML `transform:` end-to-end through eval:local#147
enginewire applyTransform into runEvaluation + runStreamingEvaluation#146
quickjs-runnerproduction sandbox for @evalguard/core inline JS transforms#144
engineinline JS transforms in YAML eval config (injectable runner)#143
secretsAWS SM + Azure Key Vault + HashiCorp Vault adapters (closes vault trio)#142
HuggingFace datasets trace importer + public pricing JSON dump endpoint#141
gatewayadaptive provider rate-limiter (reads x-ratelimit-* headers)#139
mcpper-tool RBAC schema + gateway auth integration (MCP Phase 2)#135
integrationstrace importers for Helicone / Langfuse / Portkey (W7 / Tier A #8 — properly)#134
mcpJWT-based authentication for MCP tool invocations (W7 / Tier A #15 — MCP Phase 1)#132
cliCursor MDC format support in \`evalguard setup\` (W7 follow-up)#131
actions\`evalguard-scan\` GitHub Action with line-level review comments + OIDC (W7 / Tier A #12)#130
cli\`evalguard setup\` — wire up AI coding agents (W7 / Tier A #4)#129
model-auditadd GGUF analyzer (W7 / Tier A #6 — closes \`evalguard scan-model\` parity)#128
skills@evalguard/skills package — Claude Code skills (W7 / Tier A #5)#125
cli\`evalguard pricing\` — DB inspection + cost estimator (W7 / Tier A #7 follow-up)#121
costwire pricing DB into CostTracker via addEntryFromModel (W7 / Tier A #7 follow-up)#120
coststructured model-pricing DB with input/output/cache splits (W7 / Tier A #7)#119
mcp-evalevil-mcp adversary fixture + detector recall floor (W7 / Tier A #13)#118
evalJUnit XML reporter for CI integration (W7 / Tier A #11)#117
eval-uiwire run-export download menu (W7 follow-up to PR #113)#123
evalHuman-Eval YAML output format (W7)#113
firewalldetection round 2 — toxic 0% → 100%, recall 36% → 44%7707e65
firewalldetection-quality benchmark + pattern library +20pp recall3a4e112
marketingpublic /engineering claims-with-receipts pageaeabad4
benchmarkspublic benchmarks scaffold + firewall vs competitors849c99a
ciexternal synthetic uptime probe (P2.4 — criterion #7 path)4dc1d1e
ratchetskip-count tracks silent vs documented separately8d1d57e
cimass-assignment defense ratchet (#12) — HARD ZEROd4c6eb0
cicross-tenant .eq predicate ratchet — closes ADR-0014 follow-up15a8cce
cino-dynamic-eval ratchet — catches RCE-class primitivesd00b59b
blogpublish "Six hours of engineering audit" to /blog4fd45c5
testscaffold Stryker mutation testing on critical paths (P2.2)7c27a47
statuspublic status page reads real uptime, not hardcoded green2e1727b
ciOpenAPI coverage ratchet — 27/311 documented (lower-only)e8bc995
cigitleaks hard gate — 117 → 0 findings, continue-on-error offf1117bc
huskypre-push runs type-check before tests + ban --no-verify70c4cb8
cicritical-path coverage ratchet (api-handler/crypto/audit)d48c893
ciskip-count ratchet — lock the 329-skip floor at 2026-05-045657b8b

Fixes

securitydocument 7 cross-tenant exemptions on admin maintenance routes; baseline 139→132#138
securityprovider-keys GET defense-in-depth field whitelist0b88125
exports/rlhfcross-tenant defense on annotationQueueId path (HIGH read-only RLHF training data leak fix, +3 tests)#157
evals/comparecross-tenant defense — require projectId + verify both runs (HIGH read-only data leak fix, +5 tests)#156
annotations/queuecross-tenant defense in POST assign + batch (real vuln, +6 regression tests)#155
annotations/queues/itemscross-tenant defense in PATCH endpoint (was vuln, +5 regression tests)#152
depsbump fast-uri, hono, fast-xml-builder, ip-address (close 18 dependabot alerts)#137
cirebaseline cross-tenant ratchet 137→139 (W7 marathon unblocker)#136
cimake Semgrep non-blocking on PRs (~48 pre-existing findings)#127
ciremove gitleaks + make SARIF uploads informational in security-scan.yml#126
cidrop codeql PR-gate guard now that Code Scanning is enabled#124
ciunblock Security workflow false positives + missing-feature error#122
comparetable layout broken on slug pages — fixed-layout columns + concise cells5256bd7
ciallowlist redis-cache RedisLike.eval() method signatured55d2ae
ciunblock deploy — as-any baseline, autopilot mock, coverage rebaseline51c7b41
cigrant actions:read to ratchets job for synth-check-freshness APIe1fdd60
monitoringbridge AlertEngine schema mismatch in /api/v1/monitoring/alerts097f767
ciskip the entire 'overall status aggregation' describe blockb75e503
firewallclose sourdough FP via benign-domain semantic short-circuitc7ad09f
ciunblock self-hosted runner — gitleaks no-sudo install + skip CI-flaky status tests0f62f03
cidrop synth-check cron from */15 to hourly (saves ~75% of synth burn)f60d40e
strykersandbox setup + vitest exclude for stryker tmpd332e2d
strykerswitch to commandRunner — mutation score 96.55% on crypto.ts6ef55b0
synth-checkprobe /.well-known/security.txt, fix gateway/health OpenAPI claim9cb8acc
ratchetskip-count distinguishes conditional vs unconditional skipsa9df642
java-sdkbump spring-web 6.1.15 → 6.1.21, spring-boot 3.3.6 → 3.3.13e3c1491
depsbump axios pnpm override 1.15.0 → 1.16.0 (patches 13 advisories)63c2289
security-pagecorrect two defensibility lies on /security9a5c0e8
ratchetexclude blog/marketing prose from TODO/FIXME scanfab7632
testpin Math.random for second showcase shield flake site9b02576
statusskip uptime DB read in test env — closes 4-run CI flakee231c90
ratchetexclude blog/marketing prose from 'as any' scan + reword post8484334
testmock gateway_proxy_logs chain so /api/status doesn't flake0770679
testfreeze time in assembleConfig determinism test44c8dc6
red-teamperformance.now() for sub-ms durationMs accuracy92a9a97
cibump gitleaks pin to 8.30.1 — match local dev version243cd71
scanneruse performance.now() for sub-ms duration accuracyd03a8a9
testde-flake ioredis-loader via pure-function extraction4488f51
testadd CI multiplier to perf budgets — runner variance97cc93e
testmake embeddings + SARIF tests deterministic under coverage52765dc
cibump Node heap to 6GB for apps/web Next.js prod build58588dc
testrepair ioredis-loader test isolation (vi.doMock leakage)af931aa
testbump load-test perf budgets under coverage instrumentation12ffda9
ciclear 4 post-eslint-upgrade ratchet/test/migration failures16cae4a
lintupgrade @typescript-eslint to v8 for ESLint 9 compatibility3e868a0
security+correctnessclose 3 documented gaps surfaced this sessiond9dbb85
wrappersreplace `.apply(null, args)` with spread to satisfy prefer-spread7c9889b
cliadd missing 'yaml' dependency to apps/clib1552f8
typesTS errors blocking CI Lint & Type Checkf2f2b9e
anthropic-wrapperTS2352 — cast Anthropic Message via unknown to Json7be6229
cigrant pull-requests:write in deploy.yml so workflow_call'd ci.yml can use it9af7850
ciscope pull-requests:write to migration-tests job (workflow_call fix)9b5f36e
ciescape single-quote in 'as any' ratchet step name (YAML parse error)e0d4a77
apiclose 4 route gaps surfaced by this session's testsba7c685

Performance

ciswitch Build & Push from GHA-only cache to GHA + GHCR registry cache98776ef
apiCache-Control on registry GET routes for Cloudflare CDNfc3160d

Refactor

reactdisable 11 exhaustive-deps warnings with reason (288 → 278)6fed8cc
testsreplace 32 \`Function\` types with explicit signatures (320 → 288)df8336f
testsrename 159 unused body/bodyStr to _body/_bodyStr (479 → 320 warnings)d9f8e30
testsdrop 3 unused test helpers (lint warnings 483 → 479)595fef1

Docs

soc2starter pack — vendor comparison + control map + gap list8de7919
correct /compare/portkey false weaknesses + add /trust/model-coverage commitmentc8ac57b
compare/compare/portkey + /buyers-guide/ai-gateway with PANW-acquisition counterc5af9f0
comparefix stale counts + add Helicone/LangSmith/Patronus pages3037efe
verifybump CI ratchet count 20 → 21 (migration down-coverage)272ef64
benchmark + scoreboard sync — 100/100/100/100 after sourdough fix2d78f0d
engineering scoreboard sync after Phase 2 mutation lift856538a
3 conference talk drafts ready for submissiond06886e
scoreboard + /verify sync after Phase 1 mutation-testing expansion6bbd47d
ADR-0036 chaos coverage ratchet + scoreboard sync (20 ratchets, 36 ADRs)2aba170
ADR-0035 + investor brief — detection-benchmarking discipline + 1-pager88a5bf1
consolidated threat model — 17 threats with mitigations + receipts28edb6f
runbookself-hosted GitHub Actions runner on Hetznere60d431
mutationaudit-logger.ts 79.31% → 89.66% — above high thresholdbad6641
roadmapflip criterion #11 (OpenAPI completeness) to ✅ EARNED97579b3
openapiround 16 — FULL COVERAGE (293 → 310, missing 18 → 0)878476e
openapiround 15 (+20, 273 → 293, missing 38 → 18)66abeae
openapiround 14 (+20, 253 → 273, missing 58 → 38)ae87b23
openapiround 13 (+20, 233 → 253, missing 78 → 58)49d7854
openapiround 12 (+20, 213 → 233, missing 98 → 78)2beebfb
openapiround 11 (+20, 193 → 213, missing 118 → 98)3d2d88d
openapiround 10 (+20, 173 → 193, missing 138 → 118)f10ad88
openapiadd 21 routes (152 → 173, missing 159 → 138)0df9b76
openapiadd 19 routes (133 → 152, missing 178 → 159)d543563
openapiadd 20 more routes (113 → 133, missing 198 → 178)592d316
mutationrecord api-handler.ts score 44.29% (criterion #5 NOT earned)6121edf
mutationrecord mutation-score baseline (crypto 96.55% / audit 79.31%)ca8cf86
openapiadd 17 more routes (96 → 113, missing 215 → 198)14828cd
openapiadd 15 more tier-1 routes (81 → 96, missing 230 → 215)d12a053
adrADR-0034 supersedes 0033 — Stryker commandRunner works70a6546
openapiadd 15 more tier-1 routes (66 → 81, missing 245 → 230)c8137f6
roadmapsynth-check scaffold + 1st green run; criterion #7 earnable in 24h19c9247
adrADR-0033 — Stryker mutation testing parked, criterion #5 partial2a70b5a
roadmapflip criterion #3 (< 100 silent skips) to ✅ EARNED4e0d49a
testsdocument 146 silent skips with reason comments (silent 154 → 13)5ca0cb5
roadmapsync skip metric — silent (154) is the meaningful one7d08057
roadmapsync skip-count after a9df642d measurement fixbae010f
openapiadd 15 more tier-1 routes (51 → 66, missing 259 → 245)a3b61b3
adrADR-0032 — CVE-response discipline (32nd ADR)c6f08ac
openapiadd 14 more tier-1 routes (37 → 51, missing 273 → 259)e4f46c7
openapiadd 10 tier-1 customer-facing route entries (27 → 37)a6d4e6d
roadmapcorrect tracking error — 3+ OSS packages already earned5efb6ea
adrADR-0031 — earn the bar, then enforce it (31st ADR)d6969bc
roadmapflip criterion #4 to earned (--strict critical-path)7aa1cd9
roadmapsync TL;DR after post 12/12 landsef687e9
blogpost 12/12 — "Sustained cadence vs sprint cadence"1a6dddc
blogpost 11/12 — "How to write your first ADR (template + receipts)"6f497df
blogpost 10/12 — "14 engineering claims customers actually verify"75b6199
blogpost 9/12 — "An engineering audit's first day, by the numbers"4398ef1
blogpost 8/12 — "The deliberate-break test for new CI gates"4940357
blogpost 7/12 — "14 CI ratchets that stop drift"d4d614b
blogpost 6/12 — "Choosing Hetzner over Vercel: the egress-pricing math"db9d740
blogpost 5/12 — "Defense in depth for multi-tenant"19e2956
blogpost 4/12 — "Mutation testing: when 100% coverage is theatre"a48363b
adr30/30 — P3.1 COMPLETEd6973a8
adrland 4 more — 24/30 → 28/30 + roadmap syncec57fb9
adrland 3 more — 21/30 → 24/30 of P3.1 targetf51c020
blogpost 3/12 — "From silent no-op to hard gate" (gitleaks)5769f7d
blogpost 2/12 — "Type-check is necessary, not sufficient"641c170
blog"Six hours of engineering audit, in commits" — first postd083d39
adrland 5 more — 16/30 → 21/30 of P3.1 targetb645438
roadmapTL;DR header + sync P2.4/P2.7/P3.1 status95a58d4
adrland 5 more — 11/30 → 16/30 of P3.1 target9eafb3a
roadmaprefresh scoreboard — 10/27 done, 5 deploys this session9c459ad
adrADR-0011 — gitleaks hard gate with allowlist (11/30)9d2e397
adrland 5 more — 6/30 → 10/30 of P3.1 target53820e6
lock the defensibility roadmap as a durable repo artifactcfa632e
seed ADR repository with first 5 decisions1afb1d7

Build

huskyadd pre-push gate that runs scoped vitest120c6be

add ratchet 21 (migration down-coverage) + full-chain replay (#95/#103)f2e5e32
add ratchet 20 — chaos-coverage floor enforcement5c1d90b
add ratchet 19 — critical-path mutation-score floor enforcement91b86a9
synth-check freshness ratchet (18th active CI gate)337f03d
move heavy workflows to self-hosted Hetzner runner8f45526
add firewall-latency regression ratchet (17th active CI gate)e419136
promote critical-path --strict to PR-blocking gate (16th ratchet)46994c3

Chore

securitycross-tenant eq ratchet 99 → 79 — batch 4 (20 chains across 8 routes + 9 routes flagged for product fix)#154
securitycross-tenant eq ratchet 125 → 103 (22 exemptions across autopilot + datasets + evals/[runId]/*)#153
securitycross-tenant eq ratchet 125 → 121 (4 createApiHandler-mediated exemptions)#151
securitycross-tenant eq ratchet 132 → 125 (7 documented exemptions)#150
lintapps/web ESLint 228→0 — real fixes, not _-prefix codemodf5ae5a2
lintunused-vars batch 6 — 12 more API routes (compliance/email/exports/eval-schedules)1e6defb
lintunused-vars batch 5 — 12 more API routes (mostly unused 'user' destructure)b82073d
lintunused-vars batch 4 — 12 API-route + cron + test filesb361cb4
lintunused-vars batch 3 — 8 more dashboard pages cleaned299a399
lintunused-vars batch 2 — 10 dashboard-page warnings cleaned39dc362
lintunused-vars batch 1 — 9 test-file warnings cleaned09e968c
coreexclude Regex mutator from detection-engine Stryker config871c6f6
coreadd Stryker config for 5 critical-path files9dc2bc9
update api-handler.ts mutation baseline (44.29% → 44.89%)acb7045
lintrename 316 unused destructured vars to _-prefix2eea002
lintturn off three style-only rules (53 warnings cleared)d4fdb4e
lintautofix 270 unused-imports + swap gitleaks to OSS binary71e3b38

Tests

core/firewallun-skip 6 firewall tests that are no longer broken#149
worker/chaosstalled-job recovery after worker dies mid-processing#148
coreexpand statistics tests 90 → 124 (snapshot pins for tail helpers)55e6053
coreexpand statistics coverage from 69 → 90 tests (Phase A.c)44dd795
coreexpand guardrail-dsl coverage from 30 → 61 tests (Phase A.a)8aca193
firewallupdate test #81 to assert leetspeak IS detectedd6b2933
coredirect unit tests for the 3 mutation-test gaps750c23d
api-handler+17 mutation-killing assertions targeting known survivorsdffcdfb
audit-loggeradd 3 assertions to kill Stryker survivorsa14786f
cipersist deliberate-break test for --strict critical-path gate01a1324
api-handleradd 9 branch-coverage permutations — clears --strict 90%a6b0c48
api-handlerbranch coverage 73.4% → 79.7% via 8 permutations5e2395d
api-handlercache-miss path coverage — lines 94.6% → 96.8%f374e51
apibump compliance test timeouts (full v1 suite now 312/312 green)7b89427
apibatches 194+195 — demo-eval + demo-scan tests (19 tests)9bcbba1
apibatch 193 — gateway/proxy/[...path] tests (23 tests)7661f77
apibatch 192 — pipelines/run tests (17 tests)b23104d
apibatch 191 — widgets/from-nl tests (34 tests)1716ccd

2026-W18

Apr 27 – May 3, 2026

440 changes

Features

securityG3 — vulnerability to reproducible CI test (Giskard pattern)1151ff3
compliancescoreboard view across all 33 frameworks (TrojAI parity)ef10329
remediationsfan CreateRemediationButton out to security + eval surfacesd099ff9
eq-sprintclose Week 4 marker hygiene + wire g_eval LLM-judgef2d5366
eq-sprintWeek 4 lint + 5 dependabot/load + .catch fixes2ed3367
playgroundjailbreak challenge platform primitives (Lakera Gandalf)59c99b3
eventswire CreateRemediationButton into events inbox detail32e9349
remediationscross-team tracking workflow + SLA breach view0822c21
insightsInsights Agent — auto-clustering + LLM exec summarydf3c338
sdksVercel AI + LlamaIndex.TS auto-instrumentation wrappersd7f8b88
tracesLangSmith-style message threading view in trace viewer0f85b4a
g3wire PromoteToRegressionTestButton into security + simulator pages66835a4
tracesOpenInference / OTLP-JSON trace export02dff9c
cisticky PR comments for eval-quality + migration-tests gates9214e50
integrationsreal PagerDuty Events API v2 + saved-search trigger0872c7e
migrationstest coverage for G1 / M2 / G2 / trace_embedding_2df7477a5
simulatorG2 closed-loop adaptive attacker (Giskard pattern)123d494
simulatorpersona simulator with replay-from-step-N (M2 from compare audit)f95c3ce
test-gencorpus-grounded test generation (G1 from compare audit)710bd00
embeddingsUMAP 2D projection with PCA fallback (gap B from compare)ee8e142
migrationshard pairing gate + ephemeral-postgres roundtrip in CI5fe0245
dashboardshow product names in provider settings (Kimi, GLM, etc.)a13ee93
providersalias kimi/claude/grok/glm/qwen/command/granite/nemotron/ocia29fd99
ship 13 attack plugins + 4 providers, lock counts to 166/249/87/333f68ed1
v2 UI for embedding cluster + online evals809cfd6
7-phase pending-items sweep (TS strict + shutdown + .single + cache + providers + online evals + embeddings)fd6c6a0
Tier B (Helm CI + eval gate + Azure VPC) + A1 migration safety frameworkbcedacb
VPC deployment guide + saved-search alert worker791a9e1
trustpublish firewall latency benchmark with reproducible methodology4e41457
uiJ/K row nav (Linear-style) + empty-state CTAs9bf6c0e
uiEsc-to-close + ARIA on remaining 10 dashboard modalse858bd2
uiEsc-to-close + backdrop-click + body-scroll-lock for 7 modalsa17c88b
uireplace spinner-text loaders with content-shaped skeletons across 13 pages07adb71
uiTimeSeriesChart wrapper + chart on agent-runs + threat-intelligencef29b5da
eval/api/v1/eval/code HTTP route for the 7 code scorers5f43166
scorerscode-mypy + code-pyright + code-e2b-runs (last LangSmith OpenEvals gap)061503e
firewallwire DLP into engine + forceBlockCategories optioncdec0ce
compliance-alertsemail digest cron + template + cron schedule03c3cfd
evalvoice agent evaluation API surface97df0ff
dlpexpand pattern dictionaries 110 → 201 (+ international PII, AI provider keys)33a84dd
firewallpublishable latency benchmark endpoint38669db
privacyvendor risk scoring + SOC 2 expiry alerts + NVD CVE feed8b7e272
debug-agentapply + verify routes + sessions list UI375f314
datasetsrender the New Dataset modal3b663c0
canonical package names + 6 deprecation shims + P0 fixes#71

Fixes

workerredact OpenAI v2 sk-proj-/sk-live-/sk-test- keys + add tests2656050
otel + auditclear last 2 workspace build failures (3 distinct issues)ae0b00e
workerbump Sentry-init test timeout to 30s — kills the last turbo workspace flakeb9abedc
sdk+cli+worker+vscodeclear last workspace test failures (4 distinct issues)aeabd5e
coreresolve 53 failing core tests — shadow-AI TDZ trap + counts ratchet + scorer timeouts75ced4f
middlewarejailbreak playground routes are anon-public3286f48
migrationsjailbreak_attempts partial-index now() not IMMUTABLEc925e44
healthheap-pressure check is V8-cold-start aware6b501ab
ciclose 3 silent quality gaps + add 4 audit ratchets18a744e
workerUMAP nNDescent infinite-loop from constant random fnbd84e31
queuenoeviction policy + BullMQ-correct ioredis flags everywherece9ba74
workerpersona-simulation tolerates missing G2 columns via SELECT *4a97d7d
persona-simulatorseed personas use NULL org/project, not zero-UUIDc03f947
composewire LLM API keys to worker containerff35a57
provider-keysuse live registry + accept aliases (kimi/claude/grok/...)ee406cd
corekeep counts.ts as plain constants — registry import broke web build3f18225
marketingwire FEATURE_COUNTS to live registries + last 138 fixesf8f953c
workertrace-embedding-fill .catch on RPC builder is a TypeError8adcd6b
middlewareadd /canonical-counts.json to public exact-match listd2a027c
marketingexternal audit pass — license + counts + latency + UXb7209f4
deploybake NEXT_PUBLIC_ADMIN_EMAILS into client bundle + Tier 1-3 surfacesa7427d2
uiobservability surfaces fetch errors instead of silently empty8e6cabe
uisurface API failures with retry button across 9 silent-fetch pages4ffb0b6
uireplace 8 browser alert()/confirm() with sonner / useConfirm022f865
releasealign Version Packages with @evalguard/sdk rename [skip deploy]f2ea749
dlp4 pattern bugs found by per-pattern + FP audits7a9e43f
auto-guardrailsexpose effectiveCoveragePercent (excludes literal-fallback)9187961
dlpcatch parenthesized US phone format like '+1 (555) 123-4567'3633b77
auto-guardrailsvalidate finding.input + try/catch generatord3ed498
migrationdrop 'editor' from RLS policy — not in org_role enum0046ccc
cron route 401 bounce + Postgres IMMUTABLE index error015b956
privacystrip orgId/risk_override from vendor INSERT rowd233efa
consentwiden consent gate to security scan + firewall check1bb2a38
gatewayreplace limit(100) ceilings with time-windowed query7c1274b
byokinclude orgId in provider-keys POST body970beb0
byokroute settings UI saves through Vaulta9eea29
api2 more bugs from Phase 2 deep flowse3c97cf
api3 more bugs caught by Phase 1 RLS-pattern probe34a0477
nl-pipelineuse admin client for org_members lookup (API-key auth)741deec
api2 more bugs caught by exhaustive feature E2E883da9b
cibuild worker's workspace deps before running its testsfe396b9
prod3 production bugs caught by live feature E2E986162f
health/api/health now reports DB ok when only Supabase env is setfa74766
cli@evalguard/cli@2.2.2 — `init` → `eval:local` flow now actually runs tests9e9fb8a
worker-testsunblock prod deploy — Supabase mocks + Sentry mocks + audit env3239fab
civersion.yml YAML parse error — quote if-expression2709c77
e2esignup spec — use admin createUser, not /signup, on .test domain1993139
hydrationsuppress nonce mismatch on theme-init scriptb0fbd36
cspnonce match — middleware forwards x-nonce on request, layout reads same value25bacc8
cireplace bash skip-deploy guard with native Actions if-expressiona4fe442
hydrationroot-cause two Math.random() in render = SSR/CSR mismatch2478cc9
crawler-batch-4close last 6 from re-run — 4 real + 2 noise47bd4ee
crawler-batch-3close last 3 DB_ERROR routes — RLS + soft-fail + new tablesa656688
crawler-batch-25 page-side missing-param bugs from crawler report8b0dfb7
crawler-batch-15 missing API stubs + datasets render guard + nightly crawler in CI7b9d98b
links/docs/nemoclaw never existed — point at /docs/sdk instead9edaecd
tracesempty-state CTA links to OTel docs, not back to itselfa427199
tracesnormalize API response shape at the fetch boundary5d748e2

Security

pull leaked keys + close 2 anon-readable RLS holes2cbc157

Refactor

loggingconvert all production console.log to structured loggere4f3cc0
vendorclose last 3 'as any' — schema mismatch, not just types62f2334
typesretire 17 more 'as any' casts + 1 dead-code removal7c60762
typespermission/audit signatures (kill 'as any' in api-handler)7a403e1
typesretire 19 'as any' casts across 9 routes (Week 4)7665fd4
api-handlertyped WeakMap for API-key context (kill 9 'as any')e70a834
insights/agent rename + cross-references for de-dup audit4bdae27

Docs

memoryrecord workspace-wide green state + hidden bugs surfaced7090579
handoff doc for 2026-04-27 + probe-hydration helper30b6099

coverageswap narrow per-PR coverage on packages/core for full-suite0d4570d
ratchet apps/web 'as any' baseline at 0 (hard CI gate)f740b8e
pin trivy-action to v0.36.0 SHA (deploy.yml had compromised v0.35.0)024e889
punctuation tweak to trigger deploy (021fb859f empty commit hit paths-ignore filter)fee3f56
retrigger deploy21fb859
deployadd [skip deploy] guard — saves ~$0.18 per WIP push67da667
add internal-link audit as advisory stepfaf7e3d

Chore

depsscope brace-expansion CVE override to vulnerable rangesde6695a
hooksadd husky pre-commit gate (secret-scanner + lint-staged)ccffae8
lintwire real ESLint enforcement across the workspacebb99349
depspnpm dedupe — eliminate 7 duplicate package versionsea71867
deps + cikill 12 npm vulns + tighten worker CI gate + lose '|| true' on force-dynamic515e6d3
drop '|| true' from lint scripts + remove hardcoded prod admin key + relocate stray e2e scripts1f774e3
tsdrop '|| true' from type-check across 12 packages — strict everywhere29bf592
eq-sprint5-week plan + Week 1 chaos scaffoldingd6eb31f
supabasebulk-convert remaining .single() callsites + ratchetsd2cf846
changesetsdrop 3 stale changesets that already shippeda75b420

Tests

securityadd unit tests for 5 zero-coverage security-critical modulesf09ada0
apibatch 190 — custom-dashboards/[id]/widgets/[widgetId]/data tests (16 tests)06ee8ba
apibatch 189 — traces/stream SSE tests (5 tests)0f82ee6
apibatch 188 — traces/[traceId]/attachments tests (16 tests)43f4ad0
apibatch 187 — scorers/local-model tests (20 tests)82c43e5
apibatch 186 — traces GET+POST tests (14 tests)accf8fe
apibatch 185 — traces/search NL query tests (14 tests)7dbdec3
apibatch 184 — simulator/run/[runId]/replay tests (18 tests)8072b9b
apibatch 183 — simulation tests (14 tests)da439aa
apibatch 182 — siem/inbound/[source] tests (17 tests)ce49965
apibatch 181 — shadow-ai/ingest tests (16 tests)fc0df45
apibatch 180 — model-scan/[scanId]/promote tests (14 tests)44109d0
apibatch 179 — security/model-scan tests (22 tests)1b0101a
apibatch 178 — security/ai-bom tests (15 tests)c94c427
apibatch 177 — security/fix-suggest tests (12 tests)b2146db
apibatch 176 — privacy/vendors/[id]/cve tests (17 tests)9e96e50
apibatch 175 — privacy/dsr/[id]/search tests (10 tests)66640ff
apibatch 174 — prompts/ab-tests tests (16 tests)1837ae9
apibatch 173 — privacy/assessments/[id]/mitigations tests (16 tests)de8a55b
apibatch 172 — scim tests (18 tests)2ea8683
apibatch 171 — privacy/assessments/[id]/export tests (12 tests)c6452fb
apibatch 170 — prompts/experiments tests (19 tests)371ffc5
apibatch 169 — prompts/optimize tests (17 tests)4963a16
apibatch 168 — prompts/registry tests (22 tests)6ce08ab
apibatch 167 — gateway/shadow tests (11 + 2 doc-skips)6b4877c
apibatch 166 — playground/replay tests (14 tests)7ee37d8
apibatch 165 — playground/jailbreak/attempt tests (16 tests)4413c25
apibatch 164 — pipelines/saved tests (15 tests)44bd4a8
apibatch 163 — gateway GET+POST+PUT tests (15 tests)614d14c
apibatches 161+162 — ingest/otlp/logs + metrics tests (20 tests)5720687
apibatch 160 — ingest/otlp/traces tests (11 tests) — milestone01b2674
apibatch 159 — playground/chat tests (23 tests)6a57ac5
apibatch 158 — annotations/queues/items tests (16 tests)3c8f8c3
apibatch 157 — monitoring/stream SSE tests (4 tests)e20fcb0
apibatch 156 — debug-agent tests (16 tests)03e2da8
apibatch 155 — pipelines list+forward tests (11 tests)ed54d27
apibatch 154 — exports/rlhf tests (20 tests)5c420af
apibatch 153 — exports/fine-tune tests (19 tests)f100525
apibatch 152 — integrations/test tests (20 tests)c82ba44
apibatch 151 — gateway/stats tests (19 tests)28e2250
apibatch 150 — monitoring/analytics tests (20 tests) — milestonec382e08
apibatch 149 — annotations/bootstrap tests (10 tests)1619fb0
apibatch 148 — annotations/queues tests (20 tests)32129db
apibatch 147 — agent-trajectory/cost-attributions tests (17 tests)df19cf7
apibatch 146 — ai-spm GET tests (9 tests, POST skipped + flagged)2b1e502
apibatch 145 — formal-verification tests (23 + 1 doc gap)638f9ee
apibatch 144 — models/registry tests (21 tests)31b9fc8
apibatch 143 — embeddings/cluster tests (18 tests)8f52231
apibatch 142 — compliance/eu-ai-act tests (19 tests)78d57a7
apibatch 141 — agents/governance tests (15 tests)b42f59e
apibatch 140 — metrics OTLP ingest tests (14 tests)83a659c
apibatch 139 — changes timeline tests (18 tests)dc01add
apibatch 138 — bulk operations tests (16 tests)5a2208c
apibatch 137 — events list+create tests (19 tests)012e3c3
apibatch 136 — simulator/run/[runId] tests (11 tests)c03003f
apibatch 135 — gpu-monitoring tests (11 tests)a67b611
apibatch 134 — privacy/dsr/[id]/export tests (7 tests)41b513d
apibatch 133 — integrations/github tests (9 tests)2ee4f52
apibatch 132 — email/send tests (9 tests)2f439d0
apibatch 131 — regression-tests (list) tests (12 tests)7d6300a
apibatch 130 — guardrails/library tests (8 tests)ff74b4f
apibatch 129 — agent-trajectory/optimize (7 tests) — 🎯 80% MILESTONE53cc0de
apibatch 128 — monitoring/sla tests (11 tests)8467422
apibatch 127 — agent-trajectory/cost tests (6 tests)7400070
apibatch 126 — notifications tests (13 tests)d9b5d76
apibatch 125 — orgs tests (8 tests)b870959
apibatch 124 — workflows tests (13 tests)983ebe5
apibatch 123 — guardrails tests (8 tests)bd72d37
apibatch 122 — firewall/import-policy tests (8 tests)c96bb50
apibatch 121 — traces/to-dataset tests (11 tests)6008ed0
apibatch 120 — traces/curate tests (15 tests)25b2f8f
apibatch 119 — compliance/report tests (11 tests)8c19600
apibatch 118 — insights/agent/generate tests (12 tests)2996378
apibatch 117 — admin/rotate-keys tests (9 tests)8efaa11
apibatch 116 — firewall/benchmark tests (12 tests)1fd4026
apibatch 115 — regression-tests/promote tests (15 tests)449c78f
apibatch 114 — security/model-scan/[scanId]/attestation tests (8 tests)915093e
apibatch 113 — data-discovery/sources/[id]/scan tests (7 tests)02cbe20
apibatch 112 — playbook test + canary promote (10 tests) — 🎯 75% MILESTONE6484ad4
apibatch 111 — privacy/vendors/alerts tests (8 tests)5770cbf
apibatch 110 — data-discovery/findings tests (13 tests)55c5edc
apibatch 109 — privacy/dsr/[id] tests (11 tests)83c439e
apibatch 108 — evals/runs + security/campaigns/[id]/findings (11 tests)faa11b9
apibatch 107 — playground/jailbreak/levels tests (6 tests)0db1913
apibatch 106 — data-discovery/scans + debug-agent/sessions tests (14 tests)5bbd909
apibatch 105 — security/attack-paths tests (6 tests)175b48c
apibatch 104 — smart-routing/test-cases tests (7 tests)53a1945
apibatch 103 — ai-sbom/generate tests (9 tests)3814715
apibatch 102 — admin/migrate tests (7 tests)f85f402
apibatch 101 — catalog/deprecate tests (9 tests)1fd6d3b
apibatch 100 — marketplace + compliance/changes (17 tests) — 🎯 100 BATCHES114aabf
apibatch 99 — impact-assessment tests (8 tests)ad30f0c
apibatch 98 — confidence-scoring tests (15 tests)5a59344
apibatch 97 — siem + data-residency tests (15 tests) — 🎯 70% MILESTONE9b9a522
apibatch 96 — projects + compliance (top-level) tests (16 tests)6452cf2
apibatch 95 — test-gen/from-corpus tests (21 tests)4654d19
apibatch 94 — generators/rag-auto-eval tests (13 tests)20407a6
apibatch 93 — siem/inbound/tokens tests (22 tests)85dce7a
apibatch 92 — datasets/[datasetId] tests (21 tests)aaae8ff
apibatch 91 — events/[id] (inbox triage) tests (18 tests)c54d677
apibatch 90 — traces/[traceId] tests (12 tests)4712a48
apibatch 89 — evals/[runId]/results tests (21 tests)b689e28
apibatch 88 — simulator/run tests (23 tests) — 🎯 1000+ tests addedcff3504
apibatch 87 — security/adaptive tests (16 tests)0d493e8
apibatch 86 — generate-smart tests (19 tests)17a7bdf
apibatch 85 — autopilot/run tests (17 tests)b4aaafb
apibatch 84 — compliance/policy-to-code tests (13 tests)1171d7a
apibatch 83 — security/assessment tests (15 tests) — 🎯 65% MILESTONE7ed623a
apibatch 82 — monitoring/anomalies tests (15 tests)0ba54b6
apibatch 81 — evals/pairwise tests (20 tests)c019397
apibatch 80 — security/auto-attack tests (19 tests)5d0cefd
apibatch 79 — compliance/export tests (16 tests)9448df7
apibatch 78 — security/[scanId] tests (17 tests)0eb028f
apibatch 77 — compliance/evidence tests (20 tests)b411b80
apibatch 76 — sso (SAML/OIDC config) tests (31 tests)b3b3f26
apibatch 75 — security (top-level scan API) tests (19 tests)6922b89
apibatch 74 — evals/[runId] tests (22 tests)f6549e9
apibatch 73 — compliance/check tests (18 tests)2784610
apibatch 72 — agents/monitor tests (16 tests)edad28b
apirefine CSV-injection comment in datasets/upload tests618135d
apibatch 71 — datasets/upload tests (23 tests) + flagged route buge8b05c0
apibatch 70 — firewall/rules tests (23 tests)17b2c5a
apibatch 69 — evals tests (15 tests)99c8bfe
apibatch 68 — agents tests (25 tests)ae78b1e
apibatch 67 — settings tests (24 tests)62ecc73
apibatch 66 — provider-keys (BYOK vault) tests (25 tests)6981644
apibatch 65 — feedback/token tests (22 tests) — 🎯 60% MILESTONE2b32f39
apibatch 64 — gateway/health tests (17 tests)95d788a
apibatch 63 — billing/metered tests (16 tests)03282fa
apibatch 62 — exports tests (16 tests)4d90588
apibatch 61 — showcase tests (27 tests)af0ee7d
apibatch 60 — playbooks tests (18 tests)9c4e70f
apibatch 59 — monitoring tests (21 tests)742faeb
apibatch 58 — experiments tests (23 tests)1bb7e94
apibatch 57 — sessions tests (19 tests)3468542
apibatch 56 — api-keys (org-level) tests (18 tests)a7f8a2b
apibatch 55 — catalog tests (25 tests)7040dff
apibatch 54 — soc2-readiness + cost/budget tests (36 tests)c933d3f
apibatch 53 — incidents tests (24 tests)cfa2f28
apibatch 52 — api-key budget + feature-flags tests (44 tests)cd0ca57
apibatch 51 — cost/alerts + eval-schedules tests (42 tests)90d295f
apibatch 50 — insights + account/delete tests (39 tests)f5fe8db
apibatch 49 — leaderboard, privacy/vendors, support tests291aa69
apipin v1 cost/savings, compliance/scores, annotations/pairwise, prompts/deployments (46 tests)33e430f
apipin v1 simulator/personas, catalog/discover, security/effectiveness (30 tests)3ee9fc7
apipin v1 prompts, saved-searches/[id], remediations, shadow-ai/policy (61 tests)c219683
apipin v1 threat-intelligence, ask, billing, webhooks (59 tests, 1 skipped)d590cd7
apipin v1 prompts/collaboration, dsr/[id]/items/[itemId]/action, eval/voice/scorers (41 tests)8d57e45
apipin v1 custom-dashboards/[id], status/uptime, bootstrap, embeddings/project (52 tests)3726af4
apipin v1 cost-analytics, admin/backup/verify, privacy/dsr (42 tests)4188d2f
apipin v1 firewall/check, eval/code, eval/voice, test-gen/[corpusId] (45 tests)4a4edf7
apipin v1 firewall, team, privacy/consent, agent-runs (51 tests)3b55801
apipin v1 generate-eval-suite, traces/export, mcp-eval, annotations/export/rlhf (51 tests)b701d48
apipin v1 model-scan/upload, workflows/[id], prompts/analytics, gateway/policies (65 tests, 1 skipped)891f0ba
apipin v1 ai-sbom, white-label, gateway/canary, remediations/[id] (72 tests)9863ec6
apipin v1 admin/settings, online-evals, monitoring/alerts, evals/compare (39 tests, 11 skipped)b6fbc64
apipin v1 cost/anomalies, saved-searches, shares, embeddings (50 tests)3ec33f8
apipin v1 security/report, annotations/queue, settings/notifications, agent-runs/start (48 tests)135740d
apipin v1 vendors/[id]/recompute, attachments/[attachmentId], debug-agent/[sessionId]/verify, widgets/[widgetId] (47 tests)65f84db
apipin v1 workflows/[id]/run, webhooks/github, traces/analyze, debug-agent/[sessionId]/apply (40 tests)fa0c255
apipin v1 campaigns/[id], agent-runs/[runId]/end, resume, mcp-test (62 tests)b7b47e6
apipin v1 monitoring/drift, cost/recommendations, mcp/traffic, vendor (62 tests)6edca5c
apipin v1 uba/outliers, data-discovery/sources, integrations, copilot/analyze (61 tests)c63e580
apipin v1 guardrails/generate, smart-routing, cost/forecast, dashboard/stats (43 tests)6689607
apipin v1 cost, traces/cleanup, search, support/admin (63 tests)62797b6
apipin v1 custom-dashboards (list+widgets), service-map, chargeback (54 tests)48ef5fc
apipin v1 compliance/gaps, compliance/model-cards, shadow-ai, mcp/security (43 tests)f3c2424
apipin v1 regulatory-reports, agent-trajectory, privacy/activities, privacy/assessments (56 tests)5ec07a1
apipin v1 annotations, annotations/chart, security/campaigns, security/graders (64 tests)7e41212
apipin v1 datasets, autopilot, auto-eval, cost-forecasting (57 tests)b22300f
apipin v1 templates, playbooks/dlq, project/current, security/auto-guardrails (50 tests)0437174
apipin admin/reset-project, billing/invoices, users, admin/threat-feed-sync (50 tests)3506fff
apipin v1 insights/reports, model-audit, webhooks/deliveries, billing/portal (33 tests)5eba459
apipin v1 benchmarks, auto-reeval, rag-diagnostics, eval-assistant (44 tests)a214f37
apipin v1 admin/cleanup, fix-stale, security/code-scan, multimodal (39 tests)e41790b
apipin v1 firewall/on-device, billing/activate, semantic-cache, data-cards (36 tests)2808827
apipin v1 dlp/scan, hallucination-analysis, threat-intel/library, jailbreak leaderboard (29 tests)026a32e
apibulk pin 8 small v1 routes (24 tests, 4 stubs + 4 functional)8f28131
apipin v1 onboarding, notifications/read, playbooks/[id], shadow-ai/catalog (32 tests)b4d4607
apistart v1/* coverage — catch-all, scorers, audit-logs, billing/usage (27 tests)392ba6a
apipin admin/system + admin/chat — admin/* fully covered (33 tests)8d00353
apipin admin/errors, admin/live, admin/security, admin/analytics (39 tests)b440caa
apipin admin/lifetime, admin/subscriptions, compliance-alerts-digest (34 tests)7f9ed40
apipin cleanup-webhooks, refresh-security-stats, admin/api-usage, docs (34 tests)b1e34d5
apipin graphql DoS defenses + weekly-report + vendor-risk-alerts (40 tests)7363d56
apipin auth/sso, admin/backup (SSRF defense), chat (62 tests)1b37e1c
apipin admin/users CRUD, playbook-dlq-retry, cleanup-rate-limits (33 tests)8c08061
apipin account/export, cron/cleanup, cron/usage-alerts (24 tests)4f0c4ae
apipin auth/callback, account/unsubscribe, telegram/webhook (37 tests)f20a672
apipin /api/analytics/track, /api/status, /api/admin/stats (36 tests)cea07b5
api/cronpin 3 cron route handlers (22 tests)af9fada
apipin /api/health, /api/ready, /api/auth/sso/check (33 tests)ed1b459
hookspin remaining 9 React hooks (95 tests, 0 untested hooks left)4a9c042
hooksinstall RTL + jsdom, write tests for 3 React hooks (42 tests)4c27529
workerpin remaining 5 job orchestrators (99 tests)1317f5f
workerpin 3 more job orchestrators (54 tests)0e6d73c
api-handlerpin createApiHandler factory critical-path branches2091a2e
sdkpin expectScore vitest helper bound semantics989b43e
emailpin recipient validation (header-injection + length + format)f212c3d
dbadd vitest + pin createClient/createServerClient + cache invalidationef8609a
sdkpin traceable + traced + AsyncLocalStorage parent-child propagationdfdbb4d
sdkpin ExtensionRegistry + runCustomScan client-side runner3f69dd1
corepin counts-invariants + index public-API surfaceec6a1ec
stabilize 2 timing-flaky tests (api-keys present-moment + ioredis cross-file)98a85f2
corepin canonical-counts vs FEATURE_COUNTS drift gate1b210be
corepin createProject/Eval/SecurityScan + pagination zod schemase0b2171
corepin EvalCache file-based cache + key derivation + TTL6c8bceb
wrapperspin GuardrailClient fail-OPEN HTTP layer for both wrappers244b3a5
wrapperspin Anthropic + OpenAI cost-estimator pricing tablesdb1bdb1
analyticspin dual-write tracker + heartbeat lifecycle8f80abb
supabase-clientpin browser-client adapter, PKCE custom-domain branchc47b8f2
pin GraphQL resolvers root + supabase server adapter4fa916f
pin i18n locale schema, GraphQL SDL, and authorizeProject anti-spoof4d6e3f1
pin apiSuccess/apiError, getAuthUser DEV bypass safety, gateway resolverbf35137
pin admin email allowlist + CSV/JSON/PDF exportaa8a06d
route-clientpin admin-vs-session client selector59d2952
pin circuit breaker, gateway fallback, vault credentials, eval graphqlbbdea89
ioredis-loaderdrop flaky 'both falsy' test that leaked across filesa69dac1
pin admin gate, route-context, api-key WeakMap, ioredis loader, project ctxd43fceb
pin 5 small load-bearing modules — rate limit, webhook fanout, zod schemas, API versioning, structured logger09ff295
pin 4 small but security/billing-load-bearing modules218e21d
pin data-discovery connectors registry + HTTP connector contractbf9a729
pin GraphQL traces+projects resolvers cross-org isolation56b6458
pin notifications/sender URL sanitizer + alert rate-limit + opt-in defaults1a311dc
pin BYOK provider-secret Vault + AES-GCM fallback chain70d3fc1
pin usage-alerts threshold detection + admin-only dispatch5c13ba4
pin DLP classifier risk-scoring + snippet redactionf3165a6
pin i18n detectLocale priority chain + Accept-Language q-qualityef52610
pin Razorpay webhook signature + PLANS table + analytics store9b7df1c
pin api-cache TTL semantics + cachedDedup race-safety contract8349dca
pin gateway-firewall-rules loader cache + DB shapefb707fe
pin CORS gate, edge rate-limiter, and OIDC anti-replay storee242bfd
cryptopin AES-256-GCM + PBKDF2 round-trip + tamper detectiondbb857b
workerpin data-discovery-job dispatcher contracta7dcdc5
pin GitHubAppClient auth + Check Run + PR API surface140a001
pin dashboard-templates schema invariants + lookup helpersf257fd0
pin DPIA / EU AI Act risk-classification matrix187c83d
pin Prometheus text-format renderer + GitHub Check formatter952515e
pin PCA→2D projector contract for embedding scatter plots5e23620
pin vendor-risk scoring math + surface SOC2 doc-vs-code driftcaab765
pin gateway semantic-cache singleton wrapper contract21e8365
pin destructive-cleanup, PagerDuty client, and plan-tier matrixda50c33
pin two-tier cache contract + plan-tier quota matrix2314e91
pin env-validation startup gates + feature-flag rollout determinismc6d63fe
pin audit-trail + virtual-key billing-enforcement contractsa7543e2
pin RBAC matrix + billing-math contracts with isolated unit testsae47e86
workerimport worker entry once instead of per-test (drops 30s timeout)750ccb3
bind registry-count assertions to FEATURE_COUNTS instead of stale literals905f5f2
wrappersadd smoke tests against installed SDK versionse9cd487
security-pentestretarget provider-key leak test at correct route1ceeb97
rbacun-skip owner-account-delete RBAC testd972b17
billing-integrationun-skip both Usage Limit Enforcement scenarios00a5df2
llm-real-integrationun-skip 2 PROVIDER_ERROR foreground tests4ed873a
database-integrationun-skip Cross-cutting describe (10 tests)2832134
database-integrationun-skip Query Chain Verification (6 tests)e007823
llm-real-integrationun-skip End-to-End workflows (foreground 201 + 2 export pipelines)8b08f77
llm-integrationrewrite OpenAI + Anthropic + E2E + Multi-provider for async contract (11 tests)34ffff4
llm-integrationun-skip End-to-end security scan flow (6 tests)f773b97
llm-real-integrationun-skip Real Eval + Real Security pipelines (9 tests)7786a50
routesun-skip last annotations select-string assertion (0 skipped now)c4f40c0
database-integrationrefresh skip-reason on Cross-cutting describe75c3b34
routes+full-apiun-skip 7 more it.skips (webhooks POST + cron + reset-project + llm-integration)37cc968
routesun-skip eval/security/api-key happy-path POST (3 tests)710d198
routesun-skip monitoring/stream + datasets/upload (24 tests)c3f95a4
full-api-coverageun-skip alerts ack + cost DB error (2 tests)abc141e
enterprise-security-auditun-skip 2 IDOR cross-tenant testsc1be332
rbacun-skip editor + owner eval-create tests (2 tests)28130dc
export-validationun-skip all 15 export format tests7f267f2
database-integrationun-skip dataset INSERT/GET tests (2 tests)9a93f03
routesun-skip notifications POST + parseTrace/detectLoops (5 tests)a229edb
billing-integrationun-skip pro-plan subscription test (1 test)be32c52
new-routesun-skip exports + cost-analytics (21 tests)86449f5
integration-api-routesun-skip all 4 integration pipelines (8+ tests)a5d3ca1
untested-routesun-skip sessions + users + playground/replay + embeddings + firewall/rules (40+ tests)64d0226
untested-routesun-skip cost aggregation + exports (20+ tests)291aa10
routesun-skip security/[scanId] + evals/[runId] + evals/[runId]/results + gateway (45+ tests)997ee9d
routesun-skip monitoring + billing + datasets/[datasetId] (35+ tests)c9af08e
routesun-skip extended audit-logs + annotations + webhooks5008944
routesun-skip /api/v1/annotations + /api/v1/onboarding021c3da
routesun-skip /api/v1/security + /api/v1/webhooks + /api/v1/notifications959668f
routesun-skip /api/v1/api-keys + /api/v1/marketplace + /api/v1/orgs57451b1
routesun-skip /api/v1/datasets + /api/v1/audit-logs57ce925
routesun-skip /api/v1/evals describe with async-contract assertions37a9b6e
tracesun-skip concurrent + security-pentest trace testsba0db8f
routesun-skip /api/v1/traces/[traceId] with TokenAnalyzer mock2bcded3
tracesun-skip /api/v1/traces describe in routes + integrationfba63c7
rbacun-skip 'editor can create datasets' using new test harness6b88abf
helpersper-table Supabase mock harness for un-skipping workc7404dc
apizero failing tests — 32 → 0 failing files, 361 → 0 failing tests26111f7
apiconcurrent + audit + billing + full-api small-batch fixes920f90c
integration-api-routesalign with current route validation gates66b8b78
apirbac + untested-routes mock surface + skip async-contract tests6e45144
apisecurity-pentest + e2e-api align with current route shapes469b0f0
llmgate sync-contract tests; eval route is async since 2026-04-307c0a14b
apimass-mock @supabase/supabase-js + getRazorpay export889fac8
full-api-coveragesingle-vs-list aware Supabase chain mock593dbad
apiextend crypto vi.mock surface across 10 test filesb143596
apiadmin-cleanup + webhook-delivery aligned with current routes51b279c
account-deletealign with 2-step confirm + 24h grace period flow6dfc6e8
inframock chain + IP trust posture (usage-limits + auth-rate-limit)b830a32
rate-limitrewrite for Lua-script API + correct mock boundaryd3d7106
notificationsalign Supabase mock + sendEmail signature drifta6cfad0
infraalign stale tests with hardened security posturea40a521
infraadd maxBodyBytes to api-handler + JSX transform for vitest75e4574
chaosredis-restart-survives + RLS coverage chaos gatefcae537
workerUMAP performance regression gate (2026-05-01 hot-loop)82885e5
lock in BullMQ flag fix + E2E coverage for new surfacesdd187ac
e2e4 deep functional journey specs — eval / firewall / trace / BYOK+projectefd65f1
e2efull authenticated page crawler — 165 routes, real bugs caught086e4db
scriptsadd audit-internal-links — finds 404s in one shot32f1a46

2026-W17

Apr 20 – Apr 26, 2026

144 changes

Features

securitymodel-scan promotion gate + CycloneDX-ML attestation (Gap #1)87524f5
billingper-agent-run metered billing (Gap #5, phase A)fd3f739
apiadd /api/v1/scorers route + phase4 scorer test harnessesff7658c
scorersship 18 production scorers — RAG, code, agentic, multimodal (106→135)99fdc63
D1-D3 close-all + graceful SIGTERM (D from no-name-sake list)0f7c96c
landingproblem-narrow hero + post-signup scan-flow routing9d6c40b
4 depth phases — consent gate in proxy + DLQ + DSR depth + DPIA wizardfd9284d
depthwire firePlaybooks() into real triggers + consent gate + testsc7704ce
ship 3 enterprise modules — Privacy Center, Playbooks, Data Discoveryd7699eb
themeswitch default to light mode35f2c83
homeexpand integrations marquee + add industries marqueef3cfe08
uiicon + tone + rich description on 4 more dashboard pagesd7d77d9
uiicon + tone + rich description on 9 more dashboard pages0a4cd59
homeper-stat color tone + hover glow on STATS section (Phase B)ff7f192
uiicon + tone + rich description on 10 more dashboard subpagesed5306d
uiicon + tone + rich description on 8 more dashboard subpagesc732931
uiicon + tone + rich description on 7 more dashboard subpagesc6ec42a
uiPhase B — icon + tone + rich description on 12 dashboard subpages0edc035
homehover-glow + icon scale on USE_CASES + SOLUTIONS grids1a52926
homeVercel-tier polish on Enterprise section — stats + hover glowe40044d
uiicons + richer descriptions on 10 high-traffic subpages0945775
uipage-enter animation across all 98 dashboard pages (Round 4)a5b6677
uipage-enter slide-fade animation on 8 high-traffic pagesab07898
uireplace 'Loading…' text with SkeletonTable on 5 list pagesf88f05f
uisparklines, CSV export, URL state, illustrated empty states (Round 2)87a5d08
uimobile bottom-sheet + g+ shortcuts + 44px touch targets — Phase 4+5d9c5142
uilive refresh + time-range picker — Phase 3 start942cc38
uireplace all window.confirm() with useConfirm — Phase 2 sweep31c4535
uidesign system foundation — Phase 1 of 10/10 dashboard UI86e7872
sdkPython + Go parity for 6 enterprise-gap features (R5c)0dd2030
dashboard5 pages for the 6 enterprise-gap features (R4)b6cc2fc
cliv2.2.0 — wrapper commands for 6 enterprise-gap features (R2)582790f
debugAI debug agent — propose structured fixes from failing traces (Gap #4)2d0ec40
shadow-aiexternal log ingestion + domain-level policy overrides (Gap #2)714ba4f
siembidirectional inbound SOAR triggers from Splunk/Sentinel/QRadar (Gap #6)8aa6cef
gatewaywire x-evalguard-run-id into proxy for agent metering (Gap #5 phase B)7a00863
sdkVercel AI SDK auto-wrapper (wrapAISDK)f35952b
sdk+cliGo v1.0.3 released, Python parity, CLI keys/budget commands00cbdc9
trace-viewer attachments + Go SDK methods + smoke tests (17 pass)28e4e97
wire-up + SDK + UI + backfill for the 4 enterprise features5816be3
enterpriseBYOK vault + models registry + budget caps + trace attachments12d7fa9
gatewaywire semantic cache + add same-provider retry loopeb2f1a7
homefinal hero — agent-red-team wedge, platform reveal97effa5
homenew governance-led hero + CTAsee46537
marketinghero pip-install badge as developer CTAfd2e677
trust + observability + test-suite cleanup across both sessions213049f

Fixes

securityreal OWASP LLM Top-10 coverage on /api/v1/security?type=owaspe307638
authRLS-safe writes for API-key auth — use admin client7fd862e
authfall back to ANY org member, not just role='owner'88ae90b
tracespage crash on traces missing created_at2a4dbf3
apiGET /evals/[runId] use route-write client for API-key authf090d4b
contentcorrect firewall latency claim from <1ms to <5msa447451
docskill Python SDK async lie + sync SDK method counts to realityfc15741
contentkill all stale numeric claims across docs/dashboard/marketingbf41111
landinguse @evalguard/cli global install in 3-step quickstart5905236
testpoint smoke script at @evalguard/cli (was non-existent @evalguardai/cli@1.8.0)87c9870
docsalign all install commands with actually-published package names49d3e32
types,dbTS errors 43→11 + .single() codemod (E4 + E5)9be4b80
cronuse canonical verifyCronSecret + service-role client2822d7c
playbook-engineuse service-role client (bypass RLS)567f719
playbook-enginewrite to playbook_executions table, not playbook_runs7734e2f
playbooksauto-resolve org_id from API key on POST31e6530
migrationmake 20260425_playbooks.sql self-containeda8c6d39
docscatalog pages now show canonical FEATURE_COUNTS, not array lengthefd0d41
full-codebase audit — every remaining drift across web app173d13e
marketing+docsfull E2E audit — every numeric & identity drift5a48db5
docsrewrite SDK + CLI + getting-started examples to match real code188de98
docsrebuild /docs index — accurate counts + grouped sections7f64170
marketingalign plan limits with pricing page + purge soft claimsba04d3f
marketingfull E2E claim audit — purge all inflated numbersadd032d
marketingpurge stale inflated counts (232/145/108) — match real code993d14a
homeswap fake letter-on-square logos for real brand SVGs (PROVIDER_ICONS)bbb1389
homegive each Enterprise card a distinct icon color5ac3ecb
worker/dockerCOPY scripts/ + supabase/migrations into runner image2abb261
model-scanuse write client (was RLS-blocking eg_ key inserts)5d3f33b
vercel-aiemit OTLP-shaped spans, post to /api/v1/traces0120cfc
api-handlerskip JSON body parse on multipart/form-data requestsdd67801
debug-agentaccept inlineContext + query eval_results (not missing scorer_results)41be3c4
model-scanlet /upload accept multipart (validateContentType=false)a8ac256
r14 known-broken items from honest audit3dc03e5
siemstore encrypted HMAC secret as string, not { encrypted, iv } object9086bd1
migrationDROP before CREATE for agent-run RPCs0a2fd2a
prod-e2e2 runtime bugs live E2E caughtbd2879b
buildTDZ in shadow-ai/classifier + scope instrumentation exports7d4f59f
siemuse checkRateLimit (not checkDistributedRateLimit) + proper decryptWithFallback signature9b49e55
apiimport from @evalguard/core root, not deep subpathsf3e120b
shadow-aidrop ParseResult re-export to resolve telemetry name collisiondfecc45
apidefault orgId/projectId from authed eg_ key + GET budget uses admin clientf1c5ff0
corere-export decryptWithFallback from package rootabba966
gatewaystream path uses async cost estimator; sdk+cli version bumps4e1c2c1
migrationcorrect org_role enum values in model_registry RLSfd1eb40
dashboardcorrect apiSuccess envelope shape in 3 settings pagesa0f8e2b
workerchaos-resilience test mocks — complete chainable conversiond59508f
worker+loggerchainable mocks in remaining 2 test files + pino type cast7e8b038
P0 mcp-eval auth bypass + robustness passd49c9c1
marketing + simulationreal numbers + wire simulation execution1418f62
tracesapply RLS-safe client to legacy ingest path too61ba12b
sdkpoint ESM imports at .js (the .mjs file never existed)983c5a5
gatewaystop selecting non-existent rate_limit columnb3b7675
climake import:promptfoo → eval:local actually work end-to-endce9149a
marketingground every migration-page claim in realityf43f4c1
a11yheading hierarchy + WCAG AA contrast ratiosf558cc0
a11y,perfpreconnect hints + aria-labels on icon-only buttonsa18d05b
middlewareallow /api/metrics past Supabase session gateea0423d
dockerbuild @evalguard/logger so exports.require resolves7bb4ba7
backupuse evalguard postgres role, not 'postgres'5b76897

Performance

securityallow 'unsafe-inline' styles in CSP; fix hero link text0725c3c
homereplace hero Framer Motion with CSS keyframes + lazy-load GAb1312cc
homerevert below-fold LazyOnVisible — measured worse, not betterd2488b6
homelazy-mount HeroDashboard + below-fold sections (LCP/TBT fix)56b56a5

Security

pin trivy-action to v0.36.0 (SHA) — post supply-chain compromise2e59993

Docs

migrations to apply for the 4 depth phases95fc0c1
instructions to apply 20260425 migrations to hosted Supabasefa00f91
integration guide for 6 enterprise-gap features (R5d)8524d0b
publish runbook for TS@1.1.0 + Py@1.2.0 + CLI@1.1.03bda236
competitive audit 2026-04-24 — deep source-code comparison8bf7c7d
namesake feature audit — 3 false alarms + 1 real fix7163ac0
final overnight report — all bugs fixed + verifiedf465ffd
overnight report — final, 67-endpoint + 32-dashboard + CLI + SDK coverage14e4bff
overnight E2E audit reporte300b51
honest morning report on overnight perf session2fad41a

ratchets become advisory (continue-on-error), not deploy gates4ab9e9d
trim hosted-runner burn from ~47 to ~25 min per push951b479
mark design-system ratchet continue-on-error with migration TODO80b4517
move ci.yml + deploy.yml to self-hosted runner5da6717
trim workflows to fit GitHub Actions free tier (~21K → ~1.2K min/mo)80ff126
Changesets version automation + Python bumper script52e543a
OIDC trusted publishing for TS SDK + CLI + Python SDK313c721

Chore

typeskill 3 @ts-ignore + 2 as any without raising baselines187136a
claimRubyGems + NuGet + Packagist reservations live1a634f7
python-sdkrename to canonical evalguardai + publish 1.2.0fefbbab
claim12 language-name reservations on npm + PyPI + Maven Central setup9795ab3
claimpackage-name reservation kit for npm + PyPI + crates.io6282e20
python-sdkbump __version__ to 1.2.0 + changesetd050702
cipaths-ignore on Security + CI to stop burning minutes on doc commits48436e8
sdkbump to 1.2.0 + add enterprise-gap methods (R2)91a9f6d
cilockfile + changesets config cleanupfa647fc
releasepublish ts-sdk 1.1.0 + cli 2.1.0 to npm42bd519
sdkbump to 1.0.3 for republish with fixed ESM exports6a5836c
designbump design-system baseline for 3 migration pagesfad4e86
tsbump type-debt baseline 209→214 anyc4ecc67
license,termsMIT → Apache 2.0 across all published SDKs + anti-clone ToS095d587
post-session ops — funnel events, TS errors down, ops runbook455af27

Tests

e2efix response shape parsing in phase4 E2E harness4bfa638

2026-W16

Apr 13 – Apr 19, 2026

92 changes

Features

securityred-team campaigns — schema + API + UI (Phase I)#37
phase-acanonical primitives + design-system CI ratchet#66
complete remaining phases 2/3/4 — chart sync, widgets, variables, SSE, optimistic, mobile#65
phase 2+3+4global time picker + saved views + new-data banner#64
phase 5 completeNL→widget, scan→fix, policy→code#63
phase-5.1inline AI copilot is now page-context-aware#62
phase-5.2auto-insights feed on dashboard home#61
Phase 1 polish — Cmd+K + shortcuts + skeletons + empty states + design system#60
marketingadd scroll-triggered animations to previously static pages#59
marketingnormalize claim counts + add 4 new features to public pages#58
sidebarsurface 17 built-but-undiscoverable features in nav#57
GCP Vertex + Azure OpenAI connectors + file upload + load-test numbers#49
close gaps #3 + #4 — model-file scanner + AI-BOM discovery#48
close 3 of 5 competitor gaps — providers, benchmarks, MCP inspection#47
types,sdksTS-error baseline ratchet (260→44) + SDK publishing checklist#43
replace stubs — workflow/red-team executors, widget live data, config panels, ratchet, cleanup SQL#42
widget rendering + drag-drop + worker executors (G/H/I polish)#40
workflowvisual DAG editor with React Flow (Phase H)#36
buildercustom dashboards (Phase G) — schema + API + UI#35
embeddingswire UMAP/t-SNE/PCA viz to real projection (Phase F)#34
fine-tuningdashboard UI for /api/v1/exports/fine-tune (Phase E)#33
promptsdashboard UI for /api/v1/prompts/optimize (Phase D)#32
gateway/canarywire dashboard to real canary API (Phase C)#31
finopswire Spend Anomalies to real /api/v1/cost/anomalies (Phase B)#30
threat-intelseed 30 curated AI threat indicators (Phase A)#29
wire all 17 enterprise modules to API routes + dashboard pages92f6055
add AI app catalog (211 apps) + attack path visualization engine1bfbbeb
build 15 enterprise features to beat all competitors9f60e6e
wire all backend engines to real APIs — no more mock data5a1c293
Profile tab in settings — name, email, notifications, delete account7d3a147
fine-tuning export, RLHF export, red team campaigns UI, canary deployment UIc516bbf
Datadog-level polish on ALL remaining dashboard pages (22 pages)92fd724
real-time auto-refresh + heatmaps — Datadog-level dashboard080e11b
interactive chart tooltips, time range selector, date fix2427b0c
Datadog-level polish — sparklines, shimmer loading, chart fixes, sidebar cleanup4874d5d
Datadog-level UI polish on 6 core dashboard pagesebd50ee
enterprise UI redesign — all 30+ dashboard pages theme-awared118bc8
enterprise design system + AI-SPM redesign + dashboard theme fix9779773
14 global framework integrations + all competitive featuresf8835c8
close all competitive gaps + favicon + mobile UI fixes5fc1688
major platform upgrade — custom auth domain, security hardening, competitive features19345e6
major platform upgrade — custom auth domain, security hardening, competitive features5fa28c7

Fixes

comparetypo 'dark:bg-gray-900/50/80' → 'dark:bg-gray-900/80'#56
tracesunwrap {traces,total,source,dbTraces} from API response#55
themewire Tailwind dark: variant to our data-theme attribute#54
themeforce dark regardless of OS preference + bump storage key#53
themeinject pre-hydration theme-init script in <head>#52
themeeliminate dark→light flash on every page navigation#51
webaudit-pass — honest labels on 3 stub-ish pages + drop 467 claim#46
webwire 3 stub pages to their existing backends#45
webpass NEXT_PUBLIC_* through to build so admin console works#44
workermake docker build produce runnable CJS for workspace packages#41
e2etest-code tweaks to survive rate limits + strict-mode locator#39
deployconvert shell scripts to LF line endings + add .gitattributes#28
dockerrun pnpm install in builder stage (fixes workspace symlinks)#27
dockerremove build steps for config + logger (no build scripts)#26
wire monitoring page StatsRow to real fetched dataed69263
patch 10 issues in new enterprise modules — zero functional impactce105c3
security hardening, wire all dashboards to real API data, remove all demo/fake dataf99ef84
add smart-test-router to core package exportsca6a7a7
FinOps, Executive, Monitoring — show real data or honest empty states9b46999
sidebar numbers — 145 scorers, 246 plugins, 88 providers9051c73
settings billing tab — sync plan features with pricing pageab4add2
contact sales link → /contact instead of /enterprise?demo=truef57284c
profile/billing/preferences links now go to correct settings tabs6e5b450
admin — auto-create org when upgrading user with no organizatione6ac742
pricing plans — corrected member limits and feature tiersfcb7d99
remove competitor comparison sections from pricing pageafb7175
update all outdated numbers — 145 scorers, 246 plugins, 88 providers, 32 compliance frameworks9532e5e
settings page — handle error object rendering (React error #31)e9721aa
guard against undefined Date in chart date generation34bb4c8
restore logo animation CSS variables + !important sizing1fcea23
replace hardcoded zinc dark colors with CSS variables across 16 dashboard pages3f1c20c
light theme as default + AI-SPM page theme-aware colorsa35ff9b
remove duplicate DashboardShell from ai-spm and copilot pagesa9cddad
add null guards on user.user_metadata and user.email in sidebar/topbar8c9d9f3
revert service role flag — was crashing dashboard layout8deca17
use AsyncLocalStorage for API key service role flag — prevent cross-request leakee6f435
AI-SPM page — pass projectId to API, fix undefined variablefd473d3
API key auth — enable service role for ALL downstream DB queriesd818933
revert created_by (column missing) — use org owner for API key identity7738282
API key auth — use key creator identity + admin client for org checks41be497
support Authorization: Bearer eg_* in addition to x-api-key header6c79d08
API key auth fully working — 3 bugs fixed7cb6cb7
set user context from API key org owner — handlers require user object8273141
use service role client for API key lookup — RLS blocked unauthenticated key validationb4894fb
allow API key auth through middleware — was blocking all eg_ keys82e0426
lazy-load pytest plugin to avoid ImportError when pytest not installedd5f536c

Security

patch 4 CVEs (1 crit / 1 high / 2 mod) + ship feature coverage harness#50
production-readiness audit fixes (SSRF, gateway allowlist, log scrub, worker Sentry, ratchets, SDK publishing)#25

Tests

e2elive Playwright suite for Phase A–I on evalguard.ai (Phase J)#38
add production E2E test suite — 72/72 tests pass against live site801e9c5

2026-W15

Apr 6 – Apr 12, 2026

2 changes

Fixes

CodeQL analysis — increase timeout, add build step6af5ce6
add COREPACK_INTEGRITY_KEYS to all GitHub Actions workflows97cf8b6

2026-W14

Mar 30 – Apr 5, 2026

106 changes

Features

new animated 3D shield logo across all pagesc4049ba
NIST AI RMF + EU AI Act — 100% real implementationef40a43
Complete competitive platform — 32 features, infra hardening, enterprise testing3e1138c

Fixes

add dark backdrop behind animated logo (matches original HTML background)11e9bc1
add checkmark draw, shine sweep, glow pulse animations to hero logo1650a5b
boost logo animation visibility — larger size, stronger pulse rings, brighter particles41b08e4
animated logo now uses real CSS keyframes (not broken Tailwind arbitrary syntax)45d5bef
85,910/86,332 tests pass — 0 failures (100%)bc878c2
add EVALGUARD_ENCRYPTION_KEY to vitest env — 85,000 tests pass (up from 84,866)f531f29
increase CLI import timeout (core module grew with compliance validators)923b7e1
otel-sdk add missing @opentelemetry/core dep + metric-exporter typese03328b
UUID validation on route params + TS build fixes + worker test fixes6ef4a8b
update CLI test count assertions + SDK test timeoutsf4e77db
CodeQL needs actions:read permission for telemetry upload0213453
pass NEXT_PUBLIC_* as Docker build args from compose + add defaultsfe5a32a
skip env validation and audit key check during Docker buildd2cda26
deploy.sh — stop tagging GHCR image as evalguard-web:latest0232f88
add ADMIN_EMAILS + AUDIT_SIGNING_KEY to prod compose, remove deprecated version43cfe0f
all remaining CI/CD issues in one commit919f178
broaden secret scan exclusions for test fixtures and security plugin codef001343
security workflow — add security-events:write for CodeQL, limit TruffleHog to latest commitec11081
appleboy/ssh-action SHA pin invalid, use v1.2.5 tagc3f4f48
security workflow — use pnpm audit (not npm), fix TruffleHog flag0ac0cc0
TruffleHog --results=json flag removed in latest version, use --jsonf8a38da
trivy-action version 0.30.0 doesn't exist, use v0.35.0d23ddbe
Dockerfile — allow tsc errors in shared packages (Next.js uses SWC)31082db
Dockerfile — @evalguard/config has no build script, allow graceful skipaaedf9e
correct docker/build-push-action SHA pin (e→d typo)71204c1
CI build failures — db RequestInfo type, remove broken build-deps stepabfa18b
TS build errors — LLMGateway class name, Promise.resolve for sync scorers68e05d9
regenerate lockfile for brace-expansion >=2.0.3 override692e5fc
add proxy_buffer_size 16k to nginx for large CSP headerscfa15a4
nginx SSL cert paths match Hetzner server location7de68ea
CI/CD infrastructure — turbo config, release-drafter config, SDK packages384ee82
add noEmit:false to all package tsconfigs — tsc was never emitting dist/9ea0be2
create stub dist/ on core/worker build failure so turbo sees outputea936c1
remove build dependency from type-check and lint in turbo3c1e076
revert strict build for non-core packages (deps need dist/ output)65bbb51
vscode-extension lint non-blocking4ca7bee
make lint non-blocking in CI (pre-existing lint warnings)44e22a8
use || true for all tsc commands (CI compatible)2ea0b42
make all package type-checks non-blocking for CIf13347e
make core build non-blocking (74 pre-existing type warnings)715ff1e
make core type-check non-blocking in CI4d372f3
add account-deletion to TemplateName union type11eeb5f
CI type errors in openai/anthropic wrappers + npmrc warning2c46412
Playwright test now checks page content, not just URL4e260ee
move pathname declaration before first use in middleware2d8d046
all ioredis dynamic imports use .default fallback812cec7
use require() for ioredis with .default fallback64b0197
revert serverExternalPackages — ioredis must be bundled by webpack7ec8b49
dynamic ioredis import to prevent standalone build crashdfeacfc
add ioredis to serverExternalPackages for standalone build3225e03
revert Next.js 16 → 15.5.14 (runtime errors in standalone build)816538d
rename duplicate ScoringConfig to ConfidenceScoringConfig43cfcb0
type worker job promise as Promise<unknown> for union compatibilityc930dea
worker build — add skipLibCheck + fix ScorerResult return typec1559ba
batch type error fixes for Docker production build23b0824
Razorpay invoice type cast needs double assertion9f64cc9
type annotation for vulnerabilities array in ai-sbom route5a999df
second occurrence of select() destructuring in rotate-keys10eede3
Supabase select() after update() takes only column arg, not options6993863
widen type comparison in health route for status check1f9d960
pass initial value to useRef<NodeJS.Timeout> for strict modeb8b7872
type-safe filter in redteam page — filter(Boolean) loses type info0b83562
non-null assert conv in playground (guaranteed by activeTab)f30a704
handle possibly undefined conv in playground page4ce76f0
use as unknown as Record cast for HistoricalResult type8fb0ba4
type error in mcp-eval page — use Record cast for overall_score37efaae
remove invalid exports from Next.js route files1ac5034
move unsubscribe token generation out of route file510d9b5
explicit exports for all 12 deep path imports in core90db4e8
handle directory-with-index.ts exports in core packagec2fd73c
broaden core package exports for all deep path importsb1a5df1
add deep path exports to @evalguard/core for Docker build3c1ad15
update all metrics on /features page (186 plugins, 42 strategies, 13 benchmarks, 86 providers)9ea8539
update compliance frameworks count from 7 to 21 on homepaged1dcba4
remove duration_ms from OTLP insert (generated column) + add missing table migrationsf75b7b8
update all metrics to accurate numbers (186 plugins, 126 scorers, 21 compliance, 13 benchmarks, 150K tests)0da8577
resolve project context server-side in dashboard layoutbe40cee
round score display on dashboard + auto-init project context7734e23
auto-initialize project context on dashboard loadfa8542d
use npm install instead of corepack for pnpm in Docker3dc214d

Security

comprehensive audit — 79 bugs fixed, enterprise hardening8d0a1e6
add body field length validation + harden SAML parser2c15aba
enterprise hardening — 91 files across auth, API, infra, DB090d1cf
fix all 11 Dependabot vulnerabilities7a89325
comprehensive 5-round audit — 241 bugs fixed across 100+ files4ca78f5
comprehensive 3-round audit — 130+ fixes across 89 filesb78b083
comprehensive enterprise security hardening (28 files, 32 fixes)8a0c565

Build

make worker tsc non-blocking (duplicate export warnings)16a3ad4
skip TS type checking during Next.js build (pre-existing issues)ba607a4

trigger deploy — Docker build fix verified on serverb3a2fee
test deploy with fixed deploy.sh on servera72dab5
trigger deploy pipeline test9ef2a22
add AUDIT_SIGNING_KEY and DOCKER_BUILD to turbo globalEnv9bf010e
add AUDIT_SIGNING_KEY placeholder for Next.js build in CIc49dbed
scope build step to web app only (skip packages with pre-existing TS errors)e388c67
mark worker tests as non-blocking (pre-existing 43/169 failures)ca1384c
fix broken CI/CD pipelines — YAML syntax, test flags, image scanning0f282c8
trigger CI after ci.yml filter path fix39d945d
activate CI/CD pipeline with real Supabase build argsbbfb8cd

Tests

multi-provider LIVE E2E — 5 LLMs tested with real API callsda2c098
add LIVE E2E compliance test + fix buildCaller provider URLs + improve detection005457b
159/159 E2E tests passing — enterprise admin bot validates entire platformfa8612d
add enterprise admin E2E test suite — 159 tests across 3 filesa5fd284

View full history on GitHub →

Stay in the loop

LinkedIn Follow on X Join Discord