Skip to content

Template schema and revised ontology

Status: Accepted

Templates replace definitions. A template is a TOML file that declares a worker — what it is, what it has, where it runs, what gets recorded, and how it’s evaluated. Every section is optional with sensible defaults.

The old name was wrong. A definition implies a fixed, complete spec. Templates are composable and partial — you can specify just a workstation (tools), just compute (a sandbox), or the full stack. arpi spawn instantiates a template into a running worker.

[worker]
name = "dev-agent"
platform = "claude-code" # claude-code | codex | cursor | custom
[workstation]
mcps = ["github", "linear", "sentry"]
skills = ["deploy", "review"]
instructions = "dev-agent.agents.md"
[compute]
type = "coding" # coding | browser | desktop | desktop-windows
image = "node:20"
[audit]
kernel = true # eBPF tracing: execve, file ops, network calls
network = "passthrough" # passthrough | mitm
screen = false # VNC recording
cost = true # LLM token/cost tracking
[eval]
tasks = ["fix-checkout-bug"]
scoring = "test-pass" # test-pass | model-judge | human | custom
batch = 1
baseline = "sonnet-4"
[overrides]
env = { DEBUG = "true" }
credentials = ["LEGACY_SIGNING_KEY"]
egress = ["internal.corp:443"]

[worker] — Who is this.

FieldDefaultDescription
namefilename stemHuman-readable name. Used in arpi status, logs, audit trail.
platform"claude-code"Agent runtime. Determines which platform-specific harness is assembled.

If [worker] is omitted, name defaults to the template filename and platform defaults to claude-code.

[workstation] — What they have.

FieldDefaultDescription
mcps[]MCP servers resolved from the registry. Credentials inferred from registry metadata.
skills[]Skills resolved from the registry.
instructionsnonePath to AGENTS.md-style instruction file, paired with the template.

If [workstation] is omitted, the worker gets the platform defaults (base settings, no MCPs, no skills).

Credentials are never listed here. The registry knows what each MCP and skill requires. arpi resolves credentials from the worker’s identity and the registry metadata. See “Credential inference” below.

[compute] — Where they run.

FieldDefaultDescription
type"coding"Compute surface. See “Compute types” below.
imageplatform defaultContainer image for sandbox modes. Ignored when compute is local.

If [compute] is omitted, the worker runs on the local machine (no sandbox). This is the default for day-to-day development — your laptop is the compute.

[audit] — What gets recorded.

FieldDefaultDescription
kerneltrueeBPF kernel-level tracing. Captures every execve, file operation, and network call. Invisible to the agent, tamper-proof.
network"passthrough"Network audit mode. passthrough logs connection metadata (src, dst, bytes). mitm terminates TLS and logs full request/response bodies.
screendepends on typeVNC screen recording. Defaults to true for desktop and desktop-windows, false for coding and browser.
costtrueLLM token and cost tracking per session.

If [audit] is omitted, all defaults apply. Every worker gets kernel tracing and cost tracking. Screen recording activates automatically for CUA workloads.

[eval] — How they’re measured.

FieldDefaultDescription
tasks[]Eval task references. Can be registry names or file paths.
scoring"test-pass"How the eval is scored. test-pass checks exit code. model-judge uses an LLM to grade output. human queues for manual review. custom runs a user-provided scoring function.
batch1Number of parallel eval runs.
baselinenoneModel or template to compare against.

If [eval] is omitted, the worker runs in work mode. arpi spawn <template> --eval activates eval mode; flags override any defaults from the section.

[overrides] — Escape hatch.

FieldDefaultDescription
env{}Raw environment variables injected into the workstation.
credentials[]Explicit credential names fetched from the IAM provider. For credentials not tied to any registered capability.
egress[]Additional egress allowlist entries beyond what’s inferred from MCPs and gateway.
settings{}Raw platform-specific settings merged into the assembled config.

If [overrides] is omitted (the common case), arpi infers everything from the registry and identity. This section exists for power users who need to reach past the abstraction.

Every agent runs a CLI (Claude Code, Codex, etc.). The compute type determines what additional surface is available.

TypeWhat the agent getsDefault screen recording
codingShell + filesystemoff
browserShell + filesystem + headless Chromiumoff
desktopShell + filesystem + full X11/VNC display + Chromiumon
desktop-windowsShell + filesystem + Windows desktop + RDPon

When [compute] is present, the sandbox can run locally or remotely:

ScenarioWhat happens
No [compute] sectionWorker runs on local machine. No sandbox.
[compute] present, no --hostSandbox runs locally (Docker or local OpenSandbox).
[compute] present, --host <name>Sandbox runs on remote host (K8s pod, Azure VM).

Flags can override: --local forces local machine even if [compute] is present. --sandbox forces a sandbox even if [compute] is absent.

Credentials are derived from capabilities, not declared in templates.

  1. Template lists mcps = ["github", "sentry"].
  2. Registry metadata for each MCP declares its credential requirements and tier.
  3. arpi spawn resolves the full credential set from the registry.
  4. The IAM provider checks whether the worker’s identity has access to each credential.
  5. Credentials are provisioned according to their tier (ephemeral, proxied, or vault-managed).

The only credentials that appear in a template are in [overrides].credentials — for bespoke keys not tied to any registered capability. This should be rare.

A template can specify any subset of sections:

# Workstation-only -- just tools, no compute opinion
[workstation]
mcps = ["github", "linear"]
skills = ["deploy"]
instructions = "fullstack.agents.md"
# Compute-only -- just the sandbox, no tools opinion
[compute]
type = "desktop"
image = "arpi/cua-base:latest"

Partial templates compose at spawn time:

Terminal window
arpi spawn --workstation fullstack --compute cua-sandbox

Full templates are shorthand for specifying everything in one file. Most users write full templates. Composition is for platform operators building reusable pieces.

The old ontology had 5 file-based domains (cli, definitions, harnesses, registries, environments). The new ontology reflects that arpi is a platform with an API, not a CLI tool.

Six core domains. Governance (budgets, approval gates, policies) cuts across all six.

DomainWhat it ownsOld equivalent
IdentityWho this is (human or agent), what role, what they can access. uid/euid model, IAM provider integration, credential proxy.cli/internal/iam/ + identity-model ADR
WorkstationThe provisioned execution context. Capabilities (MCPs, skills), settings, instructions, hooks, agent definitions.harnesses/ + registries/
ComputeSandbox lifecycle. Create, run, stop, snapshot. Sandbox pool. Local, Docker, OpenSandbox, remote K8s. The universal compute primitive.environments/ + opensandbox/
RegistryUnified catalog. MCPs, skills, templates, images, agent definitions. One registry, not separate catalogs.registries/ + definitions/
ConnectivityGateway (Bifrost), agent-to-agent messaging, external channels (WhatsApp, Slack, Linear).Gateway parts of environments/ + agent-messaging ADR
ObservabilitySessions, trajectories, costs, audit logs, evals.New (from Lunar Sandbox)

definitions/ becomes templates/. Templates are worker specs, not fixed definitions. They live in the registry (the catalog of available things to spawn). During development they’re files in git; in production they’re managed by the registry service.

harnesses/ is absorbed into Workstation. Platform-specific building blocks (settings, hooks, agents) are workstation components. The assembly engine that merges them is a workstation service, not a CLI internal.

registries/ and definitions/ merge into Registry. Templates, MCPs, skills, images, agent definitions — they’re all things in the catalog. One registry, one API.

environments/ splits into Compute and Connectivity. Container images and sandbox config are compute. Gateway config is connectivity. They were never the same domain.

cli/ is no longer the runtime. The control plane is a server (api/). The CLI is a thin client. arpi spawn is syntactic sugar for POST /v1/workers. This is the architectural pivot from the BRAINSTORM-DRAFT.

Observability is new. Sessions, trajectories, evals, cost tracking, audit logs. This is the Lunar Sandbox contribution — the data collection and analysis layer that didn’t exist in arpi v1.

TermMeaningReplaces
WorkerA human or agent that performs work. Has identity, workstation, compute.”agent,” “user”
TemplateA TOML file that declares a worker spec. Partial or complete.”definition,” “context”
WorkstationThe provisioned execution context. Capabilities + settings + instructions.”environment,” “harness”
CapabilityA named thing a workstation can do. An MCP server, a skill, an integration.”plugin,” “extension”
SandboxAn isolated compute instance. The universal compute primitive.”container,” “VM”
SessionA temporal span of a worker doing work. Start, end, trajectory, cost.”run,” “execution”
TrajectoryThe complete trace of what happened in a session. eBPF kernel audit + VNC recording.”log,” “history”
GatewayThe network boundary. Routes, rate-limits, audits. Wall 1.”proxy,” “Bifrost”
RegistryThe unified catalog. Templates, capabilities, images.”registries,” “harnesses”
ChannelAn external communication surface. WhatsApp, Slack, Linear.”integration”
EvalA structured assessment of a session against criteria.”test,” “benchmark”
OrgThe tenant. One org = one physically isolated deployment.”team,” “company”
Terminal window
arpi spawn dev-agent
  1. Resolve template dev-agent from the registry.
  2. Authenticate the caller (identity domain).
  3. Resolve credentials from template’s capabilities + caller’s identity.
  4. Allocate compute (sandbox from pool, or local machine if no [compute]).
  5. Provision workstation into compute (assemble settings, MCPs, skills, instructions).
  6. Configure audit (eBPF, network mode, screen recording, cost tracking).
  7. Start the worker (launch agent platform in the provisioned workstation).
  8. Return a session (trackable via arpi status, streamable via arpi logs).

In eval mode (arpi spawn dev-agent --eval):

  1. Load eval tasks from template or flags.
  2. Run batch (parallel sessions, same template, scored independently).
  3. Store trajectories and scores in observability.
  4. Return eval results (per-session scores, aggregate stats, cost).
  • spawn-model ADR: The TOML schema is replaced. [harness] becomes [worker]. [shared] and [platform.*] become [workstation]. [env] splits into [compute] and is removed as a concept. [env.secrets] is gone — credentials are inferred.
  • ontology ADR: The 5-domain model is replaced by 6 domains. definitions/ becomes templates/ in the registry. harnesses/ becomes workstation. environments/ splits. cli/ is no longer the runtime.
  • registries ADR: Separate MCP and skills registries merge into one unified registry.

The following ADRs remain valid and are not superseded:

  • identity-model (uid/euid model unchanged)
  • credential-lifecycle (three-tier model unchanged, credential inference is additive)
  • two-wall-security (gateway + sandbox architecture unchanged)
  • sandbox-strategy (OpenSandbox adoption unchanged)
  • Template schema supports all six sections, each independently optional
  • A template with only [worker] spawns on local machine with platform defaults
  • A template with only [compute] creates a bare sandbox with no capabilities
  • Credentials are inferred from mcps and skills via registry metadata
  • [overrides].credentials is the only place explicit credentials appear
  • [audit] defaults: kernel=true, network=passthrough, screen=auto, cost=true
  • [eval] is inactive unless --eval flag is passed or section is present with tasks
  • arpi spawn hits the control plane API, not local assembly
  • Partial templates compose via --workstation and --compute flags