Directory structure
Define the adapter
The adapter tells niceeval how to send requests to the AI agent and how to read responses as a standard event stream. It’s a factory function:baseUrl (where the service runs) is passed in from the outside so the adapter never hardcodes it or reads from env.
Multi-turn messages
t.send() automatically carries ctx.session.id to continue the same session. The adapter writes the service’s returned sessionId back to ctx.session.id. To split traffic by feature flag within an experiment, see Experiments.
Define evals
Each eval sends a message and asserts on the reply, tool calls, and image understanding. Deterministic assertions (calledTool, messageIncludes) run without an API key; open-ended scoring with a judge requires a key to be set.
send carries text only), and the assistant uses its multimodal vision model to describe the image:
Define experiments
One experiment file = one configuration (single model). For cross-model comparison, write multiple files in the same experiment group folder:Start evaluating
First start the service under test (defaults to mock mode — no API key required). This example is a standalone npm project whereniceeval is a local dependency:
Next steps
- Remote Agent — full reference for
defineAgent. - Authoring Evals — single-turn, multi-turn, and dataset evals.
- CI Integration — put agent regression tests in PRs.