Anthropic
anthropicAliases: claude
YAML config
providers:
- id: anthropic:claude-3-5-sonnet-latest
config:
apiKey: ${ANTHROPIC_API_KEY}TypeScript usage
import { createProvider } from "@evalguard/core";
const provider = createProvider("anthropic", process.env.ANTHROPIC_API_KEY);
const response = await provider.complete({
model: "claude-3-5-sonnet-latest",
messages: [{ role: "user", content: "Hello" }],
});Authentication
Set ANTHROPIC_API_KEY in your environment. EvalGuard validates the key on first call and surfaces typed errors for 401 / 403 / rate-limit responses (with Retry-After parsing).
Setup walkthrough
- 1. Generate an API key at https://console.anthropic.com/settings/keys.
- 2. Set in env: `export ANTHROPIC_API_KEY=sk-ant-...`.
- 3. (Optional) Store via EvalGuard's BYOK vault for prod.
- 4. Configure: `providers: [{id: 'anthropic:claude-3-5-sonnet-latest', config: {apiKey: '${ANTHROPIC_API_KEY}'}}]`.
- 5. Smoke test: `evalguard run --provider anthropic:claude-3-5-sonnet-latest --prompt 'Hello'`.
Gotchas
- Anthropic uses a top-level `system` field instead of system messages in the array — EvalGuard's provider handles this automatically, but if you're hand-rolling the request, don't put system in messages[].
- The `max_tokens` field is REQUIRED on every request (unlike OpenAI's default). EvalGuard defaults to 1024 if not specified — raise for long-form generation.
- Prompt caching uses the `anthropic-beta: prompt-caching-2024-07-31` header (EvalGuard sends it automatically). Add `cache_control: {type: 'ephemeral'}` to specific content blocks to opt them into caching.
- 529 (overloaded) is a distinct error from 429 (rate limit) — EvalGuard treats both as retry-eligible but Anthropic's 529 means the whole service is busy, not just your tier.
Cost note
claude-3-5-sonnet: $3/M input, $15/M output. claude-haiku-4.5: $0.80/M input, $4/M output. Prompt caching gives 90% discount on cached prefix (highest of any provider) — design your prompt structure accordingly.
Recommended models
- Eval / judge
- claude-haiku-4-5-latest — fast + cheap for high-volume scoring
- Agent / tool-use
- claude-3-5-sonnet-latest — strongest tool-use + reasoning
- Code
- claude-3-5-sonnet-latest — best at large-codebase reasoning
- Vision
- claude-3-5-sonnet-latest — strong document understanding
Hand-written · 2026-05-21