Prerequisites
- git
- node 20+ & pnpm
- ~150 MB free disk for the clone + npm cache
- No accounts, no API keys, no signups required
Clone the repo
All claims on this site are verifiable from the public source. There is no closed-source branch.
If the source isn't public, every other claim on this page is unverifiable. The claims that follow live in the code at HEAD, not in a marketing deck.
git clone https://github.com/EvalGuardAi/evalguard.git cd evalguard
Verify CI ratchets exist + count them
We run 21 active CI ratchets that fail PRs on regressions: RLS coverage, cross-tenant `.eq`, mass-assignment, force-dynamic, gitleaks, OpenAPI completeness, mutation-score floors, firewall latency, synth-check freshness, chaos-test coverage, migration down-coverage, +10 more.
Most pre-seed companies run 0 or 1 hard CI gates. 21 ratchets, each with a deliberate-break test (ADR-0028), is a genuine engineering signal that survives PR review pressure.
grep -E "ratchet|cross-tenant|gitleaks|force-dynamic|mass-assignment|chaos-coverage|migration-down" .github/workflows/ci.yml | grep -v "^#" | wc -l
Count ADRs
35+ Architecture Decision Records cover every load-bearing decision: encryption, RLS, audit-log signing, BullMQ DLQ, mutation testing, detection-benchmarking discipline.
ADRs are the audit trail for *why* decisions were made. Without them, a security audit gets answered with 'I think Bob in 2023 chose this' — not defensible.
ls docs/adr/*.md | wc -l
Run the firewall latency benchmark
Detection layer p95 < 5 ms, real CI-measured, regression-gated by `scripts/firewall-latency-ratchet.cjs`.
Inline firewalls go in the request hot-path. Latency is a deal-breaker for adoption. We publish the number, the script that produced it, and the CI gate that prevents regression.
pnpm install npx tsx scripts/benchmark-firewall-latency.mjs
Run the firewall detection-quality benchmark
100% recall, 100% precision, 100% F1 on a 100-prompt corpus (50 attacks across 7 categories, 50 benign queries). Reproducible via committed script.
Latency without detection-quality is 'I block nothing, fast.' This benchmark answers 'does the firewall actually catch attacks?' against an OWASP/AdvBench-derived corpus.
npx tsx scripts/benchmark-firewall-detection.mjs
Verify OSS package downloads
4 OSS packages on npm with weekly downloads: `evalguardai-openai`, `evalguardai-anthropic`, `evalguardai-otel`, `@evalguard/sdk`.
OSS adoption is independent third-party validation. Anyone can run `npm install` and see the package; download counts are publicly auditable.
npm view evalguardai-openai downloads & npm view evalguardai-anthropic downloads & npm view @evalguard/sdk downloads & wait
Inspect the public synth-check history
External synthetic uptime checks run hourly. Public Actions history. Catches outages independently of our internal monitoring.
A status page that depends on the system it's monitoring isn't a status page. GitHub Actions runs from outside our infra; the history is publicly auditable.
open https://github.com/EvalGuardAi/evalguard/actions/workflows/synth-check.yml
Read the threat model
17 threat classes documented in one place with mitigations + verifying artifacts + honest gaps.
A customer security audit asks 'how do you defend against X?' The threat model document is the rolled-up answer. Not having one means re-deriving the answer every time.
cat docs/threat-model.md | head -100
Verify the SBOM is fresh
Daily CycloneDX SBOM generated by syft + grype, public Actions history, RFC 9116 security.txt.
Customer security questionnaires ask for SBOM. Generating one daily means the answer is 'here's today's, ask for any historical day' — not a 6-week project.
open https://evalguard.ai/.well-known/security.txt
If every step passed
You have personally verified the following claims. None of this required trust — every number came out of code you ran.
Honest about what we don't yet have
We list our gaps publicly because hiding them makes the positive claims less credible. Each item below has a roadmap committed to the repo:
- SOC 2 Type 1: gap analysis + control-to-TSC mapping done, evidence engine live; auditor engagement gated on funding, attestation target Q4 2026.
- External pentest: not done; planned post-Series-Seed funding ($10-25k).
- Bug bounty program: not started;
security@evalguard.aiopen with hall-of-fame. - api-handler.ts mutation score: 44.89% (out of 85% target). Earn-then-enforce path documented in ADR-0034.
- Detection corpus expansion: 100 prompts now, 500+ next via AdvBench / HarmBench / AISafetyLab.
Found a claim you can't verify?
That's a bug — file an issue and we'll fix the page (or the code). Diligence questions also go to barathzath@gmail.com.