Template schema and revised ontology
Status: Accepted
Template schema and revised ontology
Section titled “Template schema and revised ontology”Decision
Section titled “Decision”Templates replace definitions. A template is a TOML file that declares a worker — what it is, what it has, where it runs, what gets recorded, and how it’s evaluated. Every section is optional with sensible defaults.
Why templates, not definitions
Section titled “Why templates, not definitions”The old name was wrong. A definition implies a fixed, complete spec. Templates are composable and partial — you can specify just a workstation (tools), just compute (a sandbox), or the full stack. arpi spawn instantiates a template into a running worker.
Template schema
Section titled “Template schema”[worker]name = "dev-agent"platform = "claude-code" # claude-code | codex | cursor | custom
[workstation]mcps = ["github", "linear", "sentry"]skills = ["deploy", "review"]instructions = "dev-agent.agents.md"
[compute]type = "coding" # coding | browser | desktop | desktop-windowsimage = "node:20"
[audit]kernel = true # eBPF tracing: execve, file ops, network callsnetwork = "passthrough" # passthrough | mitmscreen = false # VNC recordingcost = true # LLM token/cost tracking
[eval]tasks = ["fix-checkout-bug"]scoring = "test-pass" # test-pass | model-judge | human | custombatch = 1baseline = "sonnet-4"
[overrides]env = { DEBUG = "true" }credentials = ["LEGACY_SIGNING_KEY"]egress = ["internal.corp:443"]Section reference
Section titled “Section reference”[worker] — Who is this.
| Field | Default | Description |
|---|---|---|
name | filename stem | Human-readable name. Used in arpi status, logs, audit trail. |
platform | "claude-code" | Agent runtime. Determines which platform-specific harness is assembled. |
If [worker] is omitted, name defaults to the template filename and platform defaults to claude-code.
[workstation] — What they have.
| Field | Default | Description |
|---|---|---|
mcps | [] | MCP servers resolved from the registry. Credentials inferred from registry metadata. |
skills | [] | Skills resolved from the registry. |
instructions | none | Path to AGENTS.md-style instruction file, paired with the template. |
If [workstation] is omitted, the worker gets the platform defaults (base settings, no MCPs, no skills).
Credentials are never listed here. The registry knows what each MCP and skill requires. arpi resolves credentials from the worker’s identity and the registry metadata. See “Credential inference” below.
[compute] — Where they run.
| Field | Default | Description |
|---|---|---|
type | "coding" | Compute surface. See “Compute types” below. |
image | platform default | Container image for sandbox modes. Ignored when compute is local. |
If [compute] is omitted, the worker runs on the local machine (no sandbox). This is the default for day-to-day development — your laptop is the compute.
[audit] — What gets recorded.
| Field | Default | Description |
|---|---|---|
kernel | true | eBPF kernel-level tracing. Captures every execve, file operation, and network call. Invisible to the agent, tamper-proof. |
network | "passthrough" | Network audit mode. passthrough logs connection metadata (src, dst, bytes). mitm terminates TLS and logs full request/response bodies. |
screen | depends on type | VNC screen recording. Defaults to true for desktop and desktop-windows, false for coding and browser. |
cost | true | LLM token and cost tracking per session. |
If [audit] is omitted, all defaults apply. Every worker gets kernel tracing and cost tracking. Screen recording activates automatically for CUA workloads.
[eval] — How they’re measured.
| Field | Default | Description |
|---|---|---|
tasks | [] | Eval task references. Can be registry names or file paths. |
scoring | "test-pass" | How the eval is scored. test-pass checks exit code. model-judge uses an LLM to grade output. human queues for manual review. custom runs a user-provided scoring function. |
batch | 1 | Number of parallel eval runs. |
baseline | none | Model or template to compare against. |
If [eval] is omitted, the worker runs in work mode. arpi spawn <template> --eval activates eval mode; flags override any defaults from the section.
[overrides] — Escape hatch.
| Field | Default | Description |
|---|---|---|
env | {} | Raw environment variables injected into the workstation. |
credentials | [] | Explicit credential names fetched from the IAM provider. For credentials not tied to any registered capability. |
egress | [] | Additional egress allowlist entries beyond what’s inferred from MCPs and gateway. |
settings | {} | Raw platform-specific settings merged into the assembled config. |
If [overrides] is omitted (the common case), arpi infers everything from the registry and identity. This section exists for power users who need to reach past the abstraction.
Compute types
Section titled “Compute types”Every agent runs a CLI (Claude Code, Codex, etc.). The compute type determines what additional surface is available.
| Type | What the agent gets | Default screen recording |
|---|---|---|
coding | Shell + filesystem | off |
browser | Shell + filesystem + headless Chromium | off |
desktop | Shell + filesystem + full X11/VNC display + Chromium | on |
desktop-windows | Shell + filesystem + Windows desktop + RDP | on |
Compute location
Section titled “Compute location”When [compute] is present, the sandbox can run locally or remotely:
| Scenario | What happens |
|---|---|
No [compute] section | Worker runs on local machine. No sandbox. |
[compute] present, no --host | Sandbox runs locally (Docker or local OpenSandbox). |
[compute] present, --host <name> | Sandbox runs on remote host (K8s pod, Azure VM). |
Flags can override: --local forces local machine even if [compute] is present. --sandbox forces a sandbox even if [compute] is absent.
Credential inference
Section titled “Credential inference”Credentials are derived from capabilities, not declared in templates.
- Template lists
mcps = ["github", "sentry"]. - Registry metadata for each MCP declares its credential requirements and tier.
arpi spawnresolves the full credential set from the registry.- The IAM provider checks whether the worker’s identity has access to each credential.
- Credentials are provisioned according to their tier (ephemeral, proxied, or vault-managed).
The only credentials that appear in a template are in [overrides].credentials — for bespoke keys not tied to any registered capability. This should be rare.
Partial templates
Section titled “Partial templates”A template can specify any subset of sections:
# Workstation-only -- just tools, no compute opinion[workstation]mcps = ["github", "linear"]skills = ["deploy"]instructions = "fullstack.agents.md"# Compute-only -- just the sandbox, no tools opinion[compute]type = "desktop"image = "arpi/cua-base:latest"Partial templates compose at spawn time:
arpi spawn --workstation fullstack --compute cua-sandboxFull templates are shorthand for specifying everything in one file. Most users write full templates. Composition is for platform operators building reusable pieces.
Revised ontology
Section titled “Revised ontology”The old ontology had 5 file-based domains (cli, definitions, harnesses, registries, environments). The new ontology reflects that arpi is a platform with an API, not a CLI tool.
Domains
Section titled “Domains”Six core domains. Governance (budgets, approval gates, policies) cuts across all six.
| Domain | What it owns | Old equivalent |
|---|---|---|
| Identity | Who this is (human or agent), what role, what they can access. uid/euid model, IAM provider integration, credential proxy. | cli/internal/iam/ + identity-model ADR |
| Workstation | The provisioned execution context. Capabilities (MCPs, skills), settings, instructions, hooks, agent definitions. | harnesses/ + registries/ |
| Compute | Sandbox lifecycle. Create, run, stop, snapshot. Sandbox pool. Local, Docker, OpenSandbox, remote K8s. The universal compute primitive. | environments/ + opensandbox/ |
| Registry | Unified catalog. MCPs, skills, templates, images, agent definitions. One registry, not separate catalogs. | registries/ + definitions/ |
| Connectivity | Gateway (Bifrost), agent-to-agent messaging, external channels (WhatsApp, Slack, Linear). | Gateway parts of environments/ + agent-messaging ADR |
| Observability | Sessions, trajectories, costs, audit logs, evals. | New (from Lunar Sandbox) |
Key changes from the old ontology
Section titled “Key changes from the old ontology”definitions/ becomes templates/. Templates are worker specs, not fixed definitions. They live in the registry (the catalog of available things to spawn). During development they’re files in git; in production they’re managed by the registry service.
harnesses/ is absorbed into Workstation. Platform-specific building blocks (settings, hooks, agents) are workstation components. The assembly engine that merges them is a workstation service, not a CLI internal.
registries/ and definitions/ merge into Registry. Templates, MCPs, skills, images, agent definitions — they’re all things in the catalog. One registry, one API.
environments/ splits into Compute and Connectivity. Container images and sandbox config are compute. Gateway config is connectivity. They were never the same domain.
cli/ is no longer the runtime. The control plane is a server (api/). The CLI is a thin client. arpi spawn is syntactic sugar for POST /v1/workers. This is the architectural pivot from the BRAINSTORM-DRAFT.
Observability is new. Sessions, trajectories, evals, cost tracking, audit logs. This is the Lunar Sandbox contribution — the data collection and analysis layer that didn’t exist in arpi v1.
Lexicon
Section titled “Lexicon”| Term | Meaning | Replaces |
|---|---|---|
| Worker | A human or agent that performs work. Has identity, workstation, compute. | ”agent,” “user” |
| Template | A TOML file that declares a worker spec. Partial or complete. | ”definition,” “context” |
| Workstation | The provisioned execution context. Capabilities + settings + instructions. | ”environment,” “harness” |
| Capability | A named thing a workstation can do. An MCP server, a skill, an integration. | ”plugin,” “extension” |
| Sandbox | An isolated compute instance. The universal compute primitive. | ”container,” “VM” |
| Session | A temporal span of a worker doing work. Start, end, trajectory, cost. | ”run,” “execution” |
| Trajectory | The complete trace of what happened in a session. eBPF kernel audit + VNC recording. | ”log,” “history” |
| Gateway | The network boundary. Routes, rate-limits, audits. Wall 1. | ”proxy,” “Bifrost” |
| Registry | The unified catalog. Templates, capabilities, images. | ”registries,” “harnesses” |
| Channel | An external communication surface. WhatsApp, Slack, Linear. | ”integration” |
| Eval | A structured assessment of a session against criteria. | ”test,” “benchmark” |
| Org | The tenant. One org = one physically isolated deployment. | ”team,” “company” |
What arpi spawn does now
Section titled “What arpi spawn does now”arpi spawn dev-agent- Resolve template
dev-agentfrom the registry. - Authenticate the caller (identity domain).
- Resolve credentials from template’s capabilities + caller’s identity.
- Allocate compute (sandbox from pool, or local machine if no
[compute]). - Provision workstation into compute (assemble settings, MCPs, skills, instructions).
- Configure audit (eBPF, network mode, screen recording, cost tracking).
- Start the worker (launch agent platform in the provisioned workstation).
- Return a session (trackable via
arpi status, streamable viaarpi logs).
In eval mode (arpi spawn dev-agent --eval):
- Load eval tasks from template or flags.
- Run batch (parallel sessions, same template, scored independently).
- Store trajectories and scores in observability.
- Return eval results (per-session scores, aggregate stats, cost).
What this supersedes
Section titled “What this supersedes”spawn-modelADR: The TOML schema is replaced.[harness]becomes[worker].[shared]and[platform.*]become[workstation].[env]splits into[compute]and is removed as a concept.[env.secrets]is gone — credentials are inferred.ontologyADR: The 5-domain model is replaced by 6 domains.definitions/becomestemplates/in the registry.harnesses/becomes workstation.environments/splits.cli/is no longer the runtime.registriesADR: Separate MCP and skills registries merge into one unified registry.
The following ADRs remain valid and are not superseded:
identity-model(uid/euid model unchanged)credential-lifecycle(three-tier model unchanged, credential inference is additive)two-wall-security(gateway + sandbox architecture unchanged)sandbox-strategy(OpenSandbox adoption unchanged)
Verification
Section titled “Verification”- Template schema supports all six sections, each independently optional
- A template with only
[worker]spawns on local machine with platform defaults - A template with only
[compute]creates a bare sandbox with no capabilities - Credentials are inferred from
mcpsandskillsvia registry metadata [overrides].credentialsis the only place explicit credentials appear[audit]defaults: kernel=true, network=passthrough, screen=auto, cost=true[eval]is inactive unless--evalflag is passed or section is present with tasksarpi spawnhits the control plane API, not local assembly- Partial templates compose via
--workstationand--computeflags