Skip to main content
A sandbox backend is the infrastructure that creates and manages the isolated environment where a coding agent runs. niceeval wraps every backend behind a single Sandbox interface, so your adapter code works identically regardless of whether the environment is a local Docker container, a Vercel micro-VM, or a third-party cloud service. You choose a backend at the CLI or in config; the adapter never needs to know which one is active.

The Sandbox interface

Every backend implements the same interface. These are the only operations an adapter ever calls:
interface Sandbox {
  runCommand(cmd: string, args?: string[], opts?: {
    env?: Record<string, string>;
    cwd?: string;
    root?: boolean;  // run as root (default: false → non-root)
  }): Promise<{ stdout: string; stderr: string; exitCode: number }>;

  runShell(script: string, opts?): Promise<CommandResult>;  // run a full shell script

  readFile(path: string): Promise<string>;
  writeFiles(files: Record<string, string>): Promise<void>;
  uploadFiles(files: SandboxFile[]): Promise<void>;         // batch upload, supports binary

  runCommand(cmd: string, args?: string[], opts?: { cwd?: string }): Promise<CommandResult>;
  runShell(script: string, opts?: { cwd?: string }): Promise<CommandResult>;

  stop(): Promise<void>;
}

Root vs non-root: why non-root is the default

Commands run as a non-root user by default. This matches the agent’s natural operating environment and, critically, it is required for Claude Code: the CLI refuses to run with --dangerously-skip-permissions when it detects it is executing as root. When you need elevated privileges — for example, to install a system package during eval setup — pass { root: true } to runCommand. Use it only for setup commands; the agent itself and all validation should run without it.
// In a sandbox.setup hook: install a system dependency as root, then work normally
await sandbox.runCommand("apt-get", ["install", "-y", "openjdk-17-jdk"], { root: true });

// The agent and all subsequent steps use the default non-root user
await sandbox.runCommand("npm", ["install"]);
The root: true semantics are consistent across every backend:
BackendDefault user{ root: true } mapping
Dockernode (UID 1000)docker exec --user root
E2Buser (non-root)commands.run(cmd, { user: "root" })
Vercel Sandboxvercel-sandbox (non-root)runCommand(cmd, { sudo: true })
Daytonaconfigured at create timeper-command user override
Modalroot by defaultno-op (already root)
Backends that are always root (such as Modal) treat { root: true } as a no-op. Backends that cannot elevate at all will throw. Either way, the semantic contract is the same — your eval code never needs to branch on which backend is active.

Available backends

Docker is the default backend and requires no cloud credentials — only a local Docker installation. It is the right choice for local development and most CI pipelines.How it works:
  • Starts a node:24-slim container running sleep infinity
  • Runs all commands via docker exec (with AutoRemove on stop)
  • Default user is node (UID 1000); global npm packages install to the user directory and are added to PATH
  • The slim base image is bootstrapped with ca-certificates and git
  • Files are uploaded using tar + putArchive, with a chown pass to fix ownership
  • Docker’s multiplexed exec stream (8-byte frame header) is parsed correctly
npx niceeval exp local fixtures/button --sandbox docker
// niceeval.config.ts
export default defineConfig({
  sandbox: "docker",
});

Selecting a backend

You can select the backend on the CLI, in config, or by relying on auto-detection:
1

CLI flag (highest priority)

npx niceeval exp local fixtures/button --sandbox docker
npx niceeval exp local fixtures/button --sandbox vercel
2

Config file

// niceeval.config.ts
export default defineConfig({
  sandbox: "auto",   // "docker" | "vercel" | "auto" | "<third-party-name>"
});
3

Auto-detection fallback

If neither is set, niceeval runs resolveBackend which returns "vercel" when a cloud token is present and "docker" otherwise.

Docker backend details

The Docker backend is zero-config and handles all the quirks of running a coding agent as a non-root user:
  • Base image: node:24-slim
  • Default user: node (UID 1000) — matches the user Claude Code expects when --dangerously-skip-permissions is used
  • Global npm installs: because the non-root user cannot write to /usr/local/lib, niceeval configures npm to install globals into the user’s home directory and prepends that directory to PATH
  • Slim image bootstrap: apt-get install ca-certificates git runs automatically on first use
  • File uploads: uses Docker’s putArchive API (tar format) followed by a chown to restore correct ownership after the root-owned write
  • Stream parsing: Docker’s exec API multiplexes stdout and stderr on a single stream with an 8-byte frame header; niceeval parses this correctly so you always get clean stdout and stderr separately

Vercel backend details

The Vercel backend requires one of:
  • VERCEL_TOKEN — a personal access token from your Vercel account settings
  • VERCEL_OIDC_TOKEN — an OIDC token, suitable for CI environments with Vercel’s OIDC integration
export VERCEL_TOKEN=vercel_...
npx niceeval exp local fixtures/button --sandbox vercel
The interface exposed to adapters is identical to Docker. You can switch an entire eval suite from Docker to Vercel by changing one line in niceeval.config.ts — no adapter code changes required.

Performance: warm pools and sandbox reuse

Sandbox cold-start time is the dominant latency factor in large eval runs. niceeval offers two mechanisms to address it:

Warm pool

niceeval pre-creates a pool of sandboxes before any eval runs. When a case starts, it claims an already-running sandbox instead of waiting for a cold boot. Cold-start cost moves off the critical path entirely.

Sandbox reuse

After a case finishes, the sandbox can be reset with git clean back to the baseline state and handed to the next case instead of being destroyed. This trades a small contamination risk for significantly faster throughput. Reuse is off by default; enable it in your runner config when speed matters more than absolute isolation.
Warm pools and reuse are scheduler-level features managed by the Runner. Individual sandbox backends only need to support fast create and reset operations — the scheduling logic lives in niceeval core.