defineAgent. The adapter owns all the details of how to call your system: in-process function references, HTTP endpoints, authentication headers, and message formats are entirely private to the adapter. Experiments reference agents directly rather than passing URLs on the CLI, because there is no universal protocol that every agent speaks.
When to use defineAgent
In-process function
Call your own function directly inside
send. Zero network overhead — the fastest possible eval loop, ideal for unit-level semantic tests in CI.Remote HTTP service
Issue a
fetch inside send using whatever protocol your service speaks. The URL, auth, and request shape are your business; niceeval never sees them.The defineAgent shape
defineAgent accepts a plain object with three fields. The send function is the only place you need to write any logic.
AgentCapabilities
Declaring capabilities lets niceeval shape the t context that eval authors receive. If a capability is absent, the corresponding assertions are not available at the type level — you get a compile-time error rather than a runtime surprise.
The
sandbox capability — which enables t.sandbox.diff, t.fileChanged, and related assertions — is only meaningful for sandbox agents that run in an isolated filesystem. Remote and in-process agents should declare only conversation and toolObservability.AgentContext
The runner passes ctx into every send call. Use ctx.signal to respect cancellation, ctx.model to forward the experiment’s model tier to your agent, and ctx.flags to read feature flags defined by the experiment.
Turn — what send returns
events array is the heart of every Turn. All assertions — t.calledTool, t.messageIncludes, t.eventOrder, and the rest — derive from this single stream. Populating it correctly is the only real job of a remote adapter.
In-process adapter example
Use an in-process adapter when your agent is a TypeScript function you can import directly. There is no network round-trip, and you get full type safety.You don’t need to declare
conversation or toolObservability if your function doesn’t support them. Omitting a capability simply means the corresponding t.* methods won’t appear in eval authors’ type signatures.Remote HTTP adapter example
When your agent lives behind an HTTP endpoint,send is just a fetch. The URL comes from an environment variable so you can point the same adapter at local or production without changing any code.
toStreamEvents — mapping your response to the standard stream
toStreamEvents is a small mapping function you write. Its job is to translate whatever your service returns into the standard StreamEvent[] vocabulary that niceeval understands. Here is what the standard types look like:
{ reply: string, tools: ToolCall[] } might look like this:
Referencing an agent from an experiment
Import your adapter from an experiment file so the run configuration is signed in and reviewable.Switching between local and production with environment variables
Because the adapter readsprocess.env internally, you can point it at any environment without touching config files. Pass the variable inline or export it before running:
npx niceeval exp local vs npx niceeval exp prod.
Standard StreamEvent types at a glance
| Type | When to emit |
|---|---|
message | Any text the agent produces (assistant reply or user echo) |
action.called | A tool or skill call is initiated |
action.result | The result of a tool or skill call (pair with action.called via callId) |
subagent.called | The agent delegates work to a child agent |
subagent.completed | The child agent finishes |
input.requested | The agent paused and is waiting for human input (HITL); triggers t.parked() |
thinking | Internal reasoning text (e.g., extended thinking from Claude) |
error | A non-fatal error the agent reported |
Skill loads (
load_skill) are modeled as action.called events with name: "load_skill". The t.loadedSkill() assertion is therefore just syntactic sugar over t.calledTool("load_skill", …) — no special event type is needed.