Mar 19, 2026 · 10 min read · audio
Deterministic LLM gating (Part 2): a harness for contract-bound outputs
How the initial kestrel-evals implementation works: YAML suites, provider calls, deterministic checks, baseline reports, and CI gating.
Writing
Posts tagged “evaluation”.
Mar 19, 2026 · 10 min read · audio
How the initial kestrel-evals implementation works: YAML suites, provider calls, deterministic checks, baseline reports, and CI gating.
Mar 19, 2026 · 4 min read · audio
If an LLM feature depends on structured output, validate the contract deterministically before you argue about quality.