~15 minutes · 9 verifiable claims

Verify our engineering claims yourself. 

Every claim on /engineering is reproducible from the public source. Step-by-step walkthrough — clone, run, see the same numbers. Designed for a DD engineer doing a 15-minute technical screen.

SOC 2 Type II — evidence liveISO 42001 — evidence liveEU AI ActGDPR

Prerequisites

  • git
  • node 20+ & pnpm
  • ~150 MB free disk for the clone + npm cache
  • No accounts, no API keys, no signups required
1

Clone the repo

~1 min
Claim

All claims on this site are verifiable from the public source. There is no closed-source branch.

Why this is the right test

If the source isn't public, every other claim on this page is unverifiable. The claims that follow live in the code at HEAD, not in a marketing deck.

Command (paste into your terminal)
git clone https://github.com/EvalGuardAi/evalguard.git
cd evalguard
Expected: ~80MB clone, 130+ commits in the past two days
2

Verify CI ratchets exist + count them

~1 min
Claim

We run 21 active CI ratchets that fail PRs on regressions: RLS coverage, cross-tenant `.eq`, mass-assignment, force-dynamic, gitleaks, OpenAPI completeness, mutation-score floors, firewall latency, synth-check freshness, chaos-test coverage, migration down-coverage, +10 more.

Why this is the right test

Most pre-seed companies run 0 or 1 hard CI gates. 21 ratchets, each with a deliberate-break test (ADR-0028), is a genuine engineering signal that survives PR review pressure.

Command (paste into your terminal)
grep -E "ratchet|cross-tenant|gitleaks|force-dynamic|mass-assignment|chaos-coverage|migration-down" .github/workflows/ci.yml | grep -v "^#" | wc -l
Expected: 21+ matching lines
3

Count ADRs

~2 min
Claim

35+ Architecture Decision Records cover every load-bearing decision: encryption, RLS, audit-log signing, BullMQ DLQ, mutation testing, detection-benchmarking discipline.

Why this is the right test

ADRs are the audit trail for *why* decisions were made. Without them, a security audit gets answered with 'I think Bob in 2023 chose this' — not defensible.

Command (paste into your terminal)
ls docs/adr/*.md | wc -l
Expected: 36+ files (35 ADRs + README.md)
4

Run the firewall latency benchmark

~3 min
Claim

Detection layer p95 < 5 ms, real CI-measured, regression-gated by `scripts/firewall-latency-ratchet.cjs`.

Why this is the right test

Inline firewalls go in the request hot-path. Latency is a deal-breaker for adoption. We publish the number, the script that produced it, and the CI gate that prevents regression.

Command (paste into your terminal)
pnpm install
npx tsx scripts/benchmark-firewall-latency.mjs
Expected: p95 < 5 ms, p50 ~1 ms
5

Run the firewall detection-quality benchmark

~2 min
Claim

100% recall, 100% precision, 100% F1 on a 100-prompt corpus (50 attacks across 7 categories, 50 benign queries). Reproducible via committed script.

Why this is the right test

Latency without detection-quality is 'I block nothing, fast.' This benchmark answers 'does the firewall actually catch attacks?' against an OWASP/AdvBench-derived corpus.

Command (paste into your terminal)
npx tsx scripts/benchmark-firewall-detection.mjs
Expected: Recall 100.00%, Precision 100.00%, F1 100.00%
6

Verify OSS package downloads

~1 min
Claim

4 OSS packages on npm with weekly downloads: `evalguardai-openai`, `evalguardai-anthropic`, `evalguardai-otel`, `@evalguard/sdk`.

Why this is the right test

OSS adoption is independent third-party validation. Anyone can run `npm install` and see the package; download counts are publicly auditable.

Command (paste into your terminal)
npm view evalguardai-openai downloads &
npm view evalguardai-anthropic downloads &
npm view @evalguard/sdk downloads &
wait
Expected: Weekly download counts for each package
7

Inspect the public synth-check history

~2 min
Claim

External synthetic uptime checks run hourly. Public Actions history. Catches outages independently of our internal monitoring.

Why this is the right test

A status page that depends on the system it's monitoring isn't a status page. GitHub Actions runs from outside our infra; the history is publicly auditable.

Command (paste into your terminal)
open https://github.com/EvalGuardAi/evalguard/actions/workflows/synth-check.yml
Expected: Continuous successful runs, hourly cadence
8

Read the threat model

~2 min
Claim

17 threat classes documented in one place with mitigations + verifying artifacts + honest gaps.

Why this is the right test

A customer security audit asks 'how do you defend against X?' The threat model document is the rolled-up answer. Not having one means re-deriving the answer every time.

Command (paste into your terminal)
cat docs/threat-model.md | head -100
Expected: 17 threats listed, each with mitigations + receipts + gaps
9

Verify the SBOM is fresh

~1 min
Claim

Daily CycloneDX SBOM generated by syft + grype, public Actions history, RFC 9116 security.txt.

Why this is the right test

Customer security questionnaires ask for SBOM. Generating one daily means the answer is 'here's today's, ask for any historical day' — not a 6-week project.

Command (paste into your terminal)
open https://evalguard.ai/.well-known/security.txt
Expected: RFC 9116 security.txt with disclosure policy + PGP key

If every step passed

You have personally verified the following claims. None of this required trust — every number came out of code you ran.

21 active CI ratchets Earned
35 ADRs in repo Earned
Firewall p95 < 5ms (real CI) Earned
Firewall detection 100% on 100-prompt corpus Earned
Head-to-head vs NeMo Guardrails (independent) Earned
OpenAPI 100% coverage (310/311 routes) Earned
Mutation testing on 8 critical-path files (5 ratcheted)Partial
Daily SBOM + vulnerability disclosure Earned
External hourly synthetic uptime checks Earned
Documented threat model (17 classes) Earned
OSS packages with weekly downloads (4 published) Earned
SOC 2 Type 1 attestation (target Q4 2026, gated on funding)Calendar / post-funding
External pentest (post-funding)Calendar / post-funding
Detection corpus expansion to 500+ promptsCalendar / post-funding

Honest about what we don't yet have

We list our gaps publicly because hiding them makes the positive claims less credible. Each item below has a roadmap committed to the repo:

  • SOC 2 Type 1: gap analysis + control-to-TSC mapping done, evidence engine live; auditor engagement gated on funding, attestation target Q4 2026.
  • External pentest: not done; planned post-Series-Seed funding ($10-25k).
  • Bug bounty program: not started; security@evalguard.ai open with hall-of-fame.
  • api-handler.ts mutation score: 44.89% (out of 85% target). Earn-then-enforce path documented in ADR-0034.
  • Detection corpus expansion: 100 prompts now, 500+ next via AdvBench / HarmBench / AISafetyLab.

Found a claim you can't verify?

That's a bug — file an issue and we'll fix the page (or the code). Diligence questions also go to barathzath@gmail.com.