Category · AI
Eval (LLM evaluation)
Automated test suite measuring the quality of an LLM.
Like unit tests, but for LLM outputs. Promptfoo, LangSmith, Braintrust, Lilypad. A dataset of cases (input → expected output) plus criteria (exact match, LLM-as-judge, embedding similarity). Essential before any production deployment.
// In action with our clients
Relevant services
Related articles
// See also
