Sandbox backends: Docker, Vercel, and third-party

A sandbox backend is the infrastructure that creates and manages the isolated environment where a coding agent runs. niceeval wraps every backend behind a single Sandbox interface, so your adapter code works identically regardless of whether the environment is a local Docker container, a Vercel micro-VM, or a third-party cloud service. You choose a backend at the CLI or in config; the adapter never needs to know which one is active.

The `Sandbox` interface

Every backend implements the same interface. These are the only operations an adapter ever calls:

interface Sandbox {
  runCommand(cmd: string, args?: string[], opts?: {
    env?: Record<string, string>;
    cwd?: string;
    root?: boolean;  // run as root (default: false → non-root)
  }): Promise<{ stdout: string; stderr: string; exitCode: number }>;

  runShell(script: string, opts?): Promise<CommandResult>;  // run a full shell script

  readFile(path: string): Promise<string>;
  writeFiles(files: Record<string, string>): Promise<void>;
  uploadFiles(files: SandboxFile[]): Promise<void>;         // batch upload, supports binary

  runCommand(cmd: string, args?: string[], opts?: { cwd?: string }): Promise<CommandResult>;
  runShell(script: string, opts?: { cwd?: string }): Promise<CommandResult>;

  stop(): Promise<void>;
}

Root vs non-root: why non-root is the default

Commands run as a non-root user by default. This matches the agent’s natural operating environment and, critically, it is required for Claude Code: the CLI refuses to run with --dangerously-skip-permissions when it detects it is executing as root. When you need elevated privileges — for example, to install a system package during eval setup — pass { root: true } to runCommand. Use it only for setup commands; the agent itself and all validation should run without it.

// In a sandbox.setup hook: install a system dependency as root, then work normally
await sandbox.runCommand("apt-get", ["install", "-y", "openjdk-17-jdk"], { root: true });

// The agent and all subsequent steps use the default non-root user
await sandbox.runCommand("npm", ["install"]);

The root: true semantics are consistent across every backend:

Backend	Default user	`{ root: true }` mapping
Docker	`node` (UID 1000)	`docker exec --user root`
E2B	`user` (non-root)	`commands.run(cmd, { user: "root" })`
Vercel Sandbox	`vercel-sandbox` (non-root)	`runCommand(cmd, { sudo: true })`
Daytona	configured at create time	per-command `user` override
Modal	root by default	no-op (already root)

Backends that are always root (such as Modal) treat { root: true } as a no-op. Backends that cannot elevate at all will throw. Either way, the semantic contract is the same — your eval code never needs to branch on which backend is active.

Available backends

Docker (default)
Vercel
Auto
Third-party

Docker is the default backend and requires no cloud credentials — only a local Docker installation. It is the right choice for local development and most CI pipelines.How it works:

Starts a node:24-slim container running sleep infinity
Runs all commands via docker exec (with AutoRemove on stop)
Default user is node (UID 1000); global npm packages install to the user directory and are added to PATH
The slim base image is bootstrapped with ca-certificates and git
Files are uploaded using tar + putArchive, with a chown pass to fix ownership
Docker’s multiplexed exec stream (8-byte frame header) is parsed correctly

npx niceeval exp local fixtures/button --sandbox docker

// niceeval.config.ts
export default defineConfig({
  sandbox: "docker",
});

The Vercel backend spins up a cloud micro-VM. It is well-suited for high-concurrency CI runs where you don’t want to manage Docker infrastructure.Requirements: set VERCEL_TOKEN or VERCEL_OIDC_TOKEN in your environment.

VERCEL_TOKEN=... npx niceeval exp local fixtures/button --sandbox vercel

// niceeval.config.ts
export default defineConfig({
  sandbox: "vercel",
});

The Vercel backend handles streaming command timeouts for long-running agent sessions using a detach-and-reconnect strategy so commands are never cut short mid-execution.

The "auto" mode inspects the environment and picks the best available backend. If VERCEL_TOKEN or VERCEL_OIDC_TOKEN is present, it uses Vercel; otherwise it falls back to Docker.

// niceeval.config.ts
export default defineConfig({
  sandbox: "auto",   // the recommended default for most teams
});

This is the recommended setting if you want local runs to use Docker automatically and CI runs to use Vercel once you add a token to your secrets.

niceeval’s createSandbox function has a plugin-style extension point for third-party sandboxing services. Any backend that implements the Sandbox interface can be registered by package name. Pass --sandbox <name> to activate it.Currently documented third-party integrations include E2B, Modal, and Daytona. Because the Sandbox interface is intentionally small (run / read / write / stop), integrating a new provider requires minimal code.

npx niceeval exp local fixtures/button --sandbox e2b

Selecting a backend

You can select the backend on the CLI, in config, or by relying on auto-detection:

CLI flag (highest priority)

npx niceeval exp local fixtures/button --sandbox docker
npx niceeval exp local fixtures/button --sandbox vercel

Config file

// niceeval.config.ts
export default defineConfig({
  sandbox: "auto",   // "docker" | "vercel" | "auto" | "<third-party-name>"
});

Auto-detection fallback

If neither is set, niceeval runs resolveBackend which returns "vercel" when a cloud token is present and "docker" otherwise.

Docker backend details

The Docker backend is zero-config and handles all the quirks of running a coding agent as a non-root user:

Base image: node:24-slim
Default user: node (UID 1000) — matches the user Claude Code expects when --dangerously-skip-permissions is used
Global npm installs: because the non-root user cannot write to /usr/local/lib, niceeval configures npm to install globals into the user’s home directory and prepends that directory to PATH
Slim image bootstrap: apt-get install ca-certificates git runs automatically on first use
File uploads: uses Docker’s putArchive API (tar format) followed by a chown to restore correct ownership after the root-owned write
Stream parsing: Docker’s exec API multiplexes stdout and stderr on a single stream with an 8-byte frame header; niceeval parses this correctly so you always get clean stdout and stderr separately

Vercel backend details

The Vercel backend requires one of:

VERCEL_TOKEN — a personal access token from your Vercel account settings
VERCEL_OIDC_TOKEN — an OIDC token, suitable for CI environments with Vercel’s OIDC integration

export VERCEL_TOKEN=vercel_...
npx niceeval exp local fixtures/button --sandbox vercel

The interface exposed to adapters is identical to Docker. You can switch an entire eval suite from Docker to Vercel by changing one line in niceeval.config.ts — no adapter code changes required.

Performance: warm pools and sandbox reuse

Sandbox cold-start time is the dominant latency factor in large eval runs. niceeval offers two mechanisms to address it:

Warm pool

niceeval pre-creates a pool of sandboxes before any eval runs. When a case starts, it claims an already-running sandbox instead of waiting for a cold boot. Cold-start cost moves off the critical path entirely.

Sandbox reuse

After a case finishes, the sandbox can be reset with git clean back to the baseline state and handed to the next case instead of being destroyed. This trades a small contamination risk for significantly faster throughput. Reuse is off by default; enable it in your runner config when speed matters more than absolute isolation.

Warm pools and reuse are scheduler-level features managed by the Runner. Individual sandbox backends only need to support fast create and reset operations — the scheduling logic lives in niceeval core.

​The Sandbox interface

​Root vs non-root: why non-root is the default

​Available backends

​Selecting a backend

​Docker backend details

​Vercel backend details

​Performance: warm pools and sandbox reuse

Warm pool

Sandbox reuse

The `Sandbox` interface

Root vs non-root: why non-root is the default

Available backends

Selecting a backend

Docker backend details

Vercel backend details

Performance: warm pools and sandbox reuse