Guide
Custom evaluators
Ship your own evaluator. Signature, lifecycle, sandboxing.
The built-in evaluators (contains, regex, exact, llm-judge,
script) cover most cases. When you need something custom — a
domain-specific scorer, a data-flow check, an external API call —
write your own evaluator.
#Signature
import type { Evaluator, EvaluatorContext, EvaluatorResult } from '@aldo-ai/eval';
export const myEvaluator: Evaluator = {
name: 'my-evaluator',
async score(ctx: EvaluatorContext): Promise<EvaluatorResult> {
const { caseInput, response, groundTruth } = ctx;
// ... compute the score
return { score: 0.92, label: 'good', detail: 'matched 23/25 fields' };
},
};The score is always a number in [0, 1]. The label and detail
are surfaced in the eval report UI.
#Lifecycle
Evaluators are registered once at suite-load time, then called once per case. They are stateless: no global state, no sticky caches, no module-level side effects. The harness runs them in a fresh sandbox per case so a misbehaving evaluator can't poison later cases.
#Sandboxing
Custom evaluators run in the same sandbox the engine uses for
script tools. Network access defaults to none; declare an
allowlist on the suite if your evaluator needs to call out:
evaluators:
- name: my-evaluator
type: script
sandbox:
network:
mode: allowlist
allowlist:
- api.example.com#Determinism
Make your evaluator deterministic. Non-determinism in evaluators
makes the promotion gate flaky — a passing run today fails tomorrow
with the same input. If you need an LLM judge, use the built-in
llm-judge type with a fixed seed and capability class.
#Distribution
Evaluators ship in the @aldo-ai/eval package or in your tenant's
private package. The registry resolves them by type (built-in)
or name (custom).