Skip to main content
Setting up niceeval takes three steps: verify your prerequisites, install the package, and run npx niceeval init to scaffold your eval directory and config file. This page walks through each step in detail and covers every configuration option you’re likely to need before your first run.

Prerequisites

Before installing niceeval, make sure your environment meets the following requirements. Node.js — niceeval requires a modern version of Node.js with ESM support. Use the version specified in your project’s .nvmrc or package.json engines field, or install the current LTS release from nodejs.org. Docker — required only for sandbox evals (coding agents like Claude Code, Codex, and bub that need an isolated filesystem). Install Docker Desktop or Docker Engine and confirm it’s running before you attempt a sandbox eval. In-process and HTTP agent evals work without Docker.
If Docker is unavailable when you run a sandbox eval, niceeval stops immediately with a clear error message. It will not silently fall back to a different backend.

Install niceeval

Add niceeval as a dev dependency using your preferred package manager:
npm install -D niceeval

Scaffold your project

Run the init command to generate your eval directory and config file:
npx niceeval init
init inspects your project layout and creates a minimal but functional starting point. After it completes, your project contains:
your-project/
├─ niceeval.config.ts           ← central configuration
└─ evals/
   ├─ hello.eval.ts             ← example: conversational eval
   └─ fixtures/
      └─ button/                ← example: sandbox coding-agent eval
         ├─ PROMPT.md
         ├─ EVAL.ts
         └─ package.json
The generated hello.eval.ts and fixtures/button/ are illustrative examples — read through them to understand the eval shape, then replace or delete them when you’re ready to write your own.
Eval IDs are derived automatically from file paths. The file evals/weather/brooklyn.eval.ts gets the ID weather/brooklyn. You never declare an ID by hand — renaming the file is all it takes to change it.

Configure niceeval

Open the generated niceeval.config.ts and review the available options:
// niceeval.config.ts
import { defineConfig } from "niceeval";
import { Console, JUnit } from "niceeval/reporters";

export default defineConfig({
  // LLM used for t.judge.* assertions
  judge: { model: "anthropic/claude-haiku-4-5" },

  // reporters that run after every eval suite
  reporters: [Console(), JUnit(".niceeval/junit.xml")],

  // how many evals to run in parallel
  maxConcurrency: 8,

  // per-eval timeout in milliseconds (5 minutes)
  timeoutMs: 300_000,

  // "auto" uses a cloud sandbox token if present, otherwise Docker
  sandbox: "auto",
});
OptionTypeDescription
judge{ model: string }The LLM model used for t.judge.* assertions
reportersReporter[]Reporters that emit results after every run
maxConcurrencynumberMaximum number of evals running at the same time
timeoutMsnumberPer-eval timeout in milliseconds
sandbox"auto" | "docker"Sandbox backend for coding-agent evals
Select agents in experiment files:
import { defineExperiment } from "niceeval";
import myAgent from "./agents/my-agent.js";

export default defineExperiment({
  agent: myAgent,
  runs: 1,
});

Set environment variables

niceeval does not manage secrets — it reads them from environment variables that your agent adapters reference at runtime. For Claude Code (Anthropic coding agent):
export ANTHROPIC_API_KEY=sk-ant-...
For Codex (OpenAI coding agent):
export OPENAI_API_KEY=sk-...
For custom HTTP agents, add whatever variables your adapter reads — for example:
export AGENT_URL=https://my-agent.example.com
Never commit API keys to your repository. Add ANTHROPIC_API_KEY and OPENAI_API_KEY to your .gitignore-d .env file for local development, and store them as CI secrets (for example secrets.ANTHROPIC_API_KEY in GitHub Actions) for automated runs.
For CI, pass the relevant secrets through the workflow env block:
# .github/workflows/evals.yml
- run: npx niceeval exp ci --strict --junit .niceeval/junit.xml
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Verify the installation

Confirm that the CLI is reachable and can discover your evals:
# list all discovered evals without running them
npx niceeval list
npx niceeval list reads niceeval.config.ts and scans your evals/ directory. A successful run prints each eval’s ID, description, and registered agent. If it exits with an error, check that niceeval.config.ts exists at the repository root and that your evals/ directory contains at least one *.eval.ts file. You can also do a dry run — this resolves every eval and prints what would execute, without calling any agent or sandbox:
npx niceeval exp local --dry
Once list and --dry both succeed, you’re ready to run your first eval. Head to the Quickstart for a step-by-step walkthrough of all three eval types.