Template schema and revised ontology

Status: Accepted

Template schema and revised ontology

Decision

Templates replace definitions. A template is a TOML file that declares a worker — what it is, what it has, where it runs, what gets recorded, and how it’s evaluated. Every section is optional with sensible defaults.

Why templates, not definitions

The old name was wrong. A definition implies a fixed, complete spec. Templates are composable and partial — you can specify just a workstation (tools), just compute (a sandbox), or the full stack. arpi spawn instantiates a template into a running worker.

Template schema

[worker]
name = "dev-agent"
platform = "claude-code"          # claude-code | codex | cursor | custom

[workstation]
mcps = ["github", "linear", "sentry"]
skills = ["deploy", "review"]
instructions = "dev-agent.agents.md"

[compute]
type = "coding"                   # coding | browser | desktop | desktop-windows
image = "node:20"

[audit]
kernel = true                     # eBPF tracing: execve, file ops, network calls
network = "passthrough"           # passthrough | mitm
screen = false                    # VNC recording
cost = true                       # LLM token/cost tracking

[eval]
tasks = ["fix-checkout-bug"]
scoring = "test-pass"             # test-pass | model-judge | human | custom
batch = 1
baseline = "sonnet-4"

[overrides]
env = { DEBUG = "true" }
credentials = ["LEGACY_SIGNING_KEY"]
egress = ["internal.corp:443"]

Section reference

[worker] — Who is this.

Field	Default	Description
`name`	filename stem	Human-readable name. Used in `arpi status`, logs, audit trail.
`platform`	`"claude-code"`	Agent runtime. Determines which platform-specific harness is assembled.

If [worker] is omitted, name defaults to the template filename and platform defaults to claude-code.

[workstation] — What they have.

Field	Default	Description
`mcps`	`[]`	MCP servers resolved from the registry. Credentials inferred from registry metadata.
`skills`	`[]`	Skills resolved from the registry.
`instructions`	none	Path to AGENTS.md-style instruction file, paired with the template.

If [workstation] is omitted, the worker gets the platform defaults (base settings, no MCPs, no skills).

Credentials are never listed here. The registry knows what each MCP and skill requires. arpi resolves credentials from the worker’s identity and the registry metadata. See “Credential inference” below.

[compute] — Where they run.

Field	Default	Description
`type`	`"coding"`	Compute surface. See “Compute types” below.
`image`	platform default	Container image for sandbox modes. Ignored when compute is local.

If [compute] is omitted, the worker runs on the local machine (no sandbox). This is the default for day-to-day development — your laptop is the compute.

[audit] — What gets recorded.

Field	Default	Description
`kernel`	`true`	eBPF kernel-level tracing. Captures every execve, file operation, and network call. Invisible to the agent, tamper-proof.
`network`	`"passthrough"`	Network audit mode. `passthrough` logs connection metadata (src, dst, bytes). `mitm` terminates TLS and logs full request/response bodies.
`screen`	depends on type	VNC screen recording. Defaults to `true` for `desktop` and `desktop-windows`, `false` for `coding` and `browser`.
`cost`	`true`	LLM token and cost tracking per session.

If [audit] is omitted, all defaults apply. Every worker gets kernel tracing and cost tracking. Screen recording activates automatically for CUA workloads.

[eval] — How they’re measured.

Field	Default	Description
`tasks`	`[]`	Eval task references. Can be registry names or file paths.
`scoring`	`"test-pass"`	How the eval is scored. `test-pass` checks exit code. `model-judge` uses an LLM to grade output. `human` queues for manual review. `custom` runs a user-provided scoring function.
`batch`	`1`	Number of parallel eval runs.
`baseline`	none	Model or template to compare against.

If [eval] is omitted, the worker runs in work mode. arpi spawn <template> --eval activates eval mode; flags override any defaults from the section.

[overrides] — Escape hatch.

Field	Default	Description
`env`	`{}`	Raw environment variables injected into the workstation.
`credentials`	`[]`	Explicit credential names fetched from the IAM provider. For credentials not tied to any registered capability.
`egress`	`[]`	Additional egress allowlist entries beyond what’s inferred from MCPs and gateway.
`settings`	`{}`	Raw platform-specific settings merged into the assembled config.

If [overrides] is omitted (the common case), arpi infers everything from the registry and identity. This section exists for power users who need to reach past the abstraction.

Compute types

Every agent runs a CLI (Claude Code, Codex, etc.). The compute type determines what additional surface is available.

Type	What the agent gets	Default screen recording
`coding`	Shell + filesystem	off
`browser`	Shell + filesystem + headless Chromium	off
`desktop`	Shell + filesystem + full X11/VNC display + Chromium	on
`desktop-windows`	Shell + filesystem + Windows desktop + RDP	on

Compute location

When [compute] is present, the sandbox can run locally or remotely:

Scenario	What happens
No `[compute]` section	Worker runs on local machine. No sandbox.
`[compute]` present, no `--host`	Sandbox runs locally (Docker or local OpenSandbox).
`[compute]` present, `--host <name>`	Sandbox runs on remote host (K8s pod, Azure VM).

Flags can override: --local forces local machine even if [compute] is present. --sandbox forces a sandbox even if [compute] is absent.

Credential inference

Credentials are derived from capabilities, not declared in templates.

Template lists mcps = ["github", "sentry"].
Registry metadata for each MCP declares its credential requirements and tier.
arpi spawn resolves the full credential set from the registry.
The IAM provider checks whether the worker’s identity has access to each credential.
Credentials are provisioned according to their tier (ephemeral, proxied, or vault-managed).

The only credentials that appear in a template are in [overrides].credentials — for bespoke keys not tied to any registered capability. This should be rare.

Partial templates

A template can specify any subset of sections:

# Workstation-only -- just tools, no compute opinion
[workstation]
mcps = ["github", "linear"]
skills = ["deploy"]
instructions = "fullstack.agents.md"

# Compute-only -- just the sandbox, no tools opinion
[compute]
type = "desktop"
image = "arpi/cua-base:latest"

Partial templates compose at spawn time:

arpi spawn --workstation fullstack --compute cua-sandbox

Full templates are shorthand for specifying everything in one file. Most users write full templates. Composition is for platform operators building reusable pieces.

Revised ontology

The old ontology had 5 file-based domains (cli, definitions, harnesses, registries, environments). The new ontology reflects that arpi is a platform with an API, not a CLI tool.

Domains

Six core domains. Governance (budgets, approval gates, policies) cuts across all six.

Domain	What it owns	Old equivalent
Identity	Who this is (human or agent), what role, what they can access. uid/euid model, IAM provider integration, credential proxy.	`cli/internal/iam/` + `identity-model` ADR
Workstation	The provisioned execution context. Capabilities (MCPs, skills), settings, instructions, hooks, agent definitions.	`harnesses/` + `registries/`
Compute	Sandbox lifecycle. Create, run, stop, snapshot. Sandbox pool. Local, Docker, OpenSandbox, remote K8s. The universal compute primitive.	`environments/` + `opensandbox/`
Registry	Unified catalog. MCPs, skills, templates, images, agent definitions. One registry, not separate catalogs.	`registries/` + `definitions/`
Connectivity	Gateway (Bifrost), agent-to-agent messaging, external channels (WhatsApp, Slack, Linear).	Gateway parts of `environments/` + `agent-messaging` ADR
Observability	Sessions, trajectories, costs, audit logs, evals.	New (from Lunar Sandbox)

Key changes from the old ontology

definitions/ becomes templates/. Templates are worker specs, not fixed definitions. They live in the registry (the catalog of available things to spawn). During development they’re files in git; in production they’re managed by the registry service.

harnesses/ is absorbed into Workstation. Platform-specific building blocks (settings, hooks, agents) are workstation components. The assembly engine that merges them is a workstation service, not a CLI internal.

registries/ and definitions/ merge into Registry. Templates, MCPs, skills, images, agent definitions — they’re all things in the catalog. One registry, one API.

environments/ splits into Compute and Connectivity. Container images and sandbox config are compute. Gateway config is connectivity. They were never the same domain.

cli/ is no longer the runtime. The control plane is a server (api/). The CLI is a thin client. arpi spawn is syntactic sugar for POST /v1/workers. This is the architectural pivot from the BRAINSTORM-DRAFT.

Observability is new. Sessions, trajectories, evals, cost tracking, audit logs. This is the Lunar Sandbox contribution — the data collection and analysis layer that didn’t exist in arpi v1.

Lexicon

Term	Meaning	Replaces
Worker	A human or agent that performs work. Has identity, workstation, compute.	”agent,” “user”
Template	A TOML file that declares a worker spec. Partial or complete.	”definition,” “context”
Workstation	The provisioned execution context. Capabilities + settings + instructions.	”environment,” “harness”
Capability	A named thing a workstation can do. An MCP server, a skill, an integration.	”plugin,” “extension”
Sandbox	An isolated compute instance. The universal compute primitive.	”container,” “VM”
Session	A temporal span of a worker doing work. Start, end, trajectory, cost.	”run,” “execution”
Trajectory	The complete trace of what happened in a session. eBPF kernel audit + VNC recording.	”log,” “history”
Gateway	The network boundary. Routes, rate-limits, audits. Wall 1.	”proxy,” “Bifrost”
Registry	The unified catalog. Templates, capabilities, images.	”registries,” “harnesses”
Channel	An external communication surface. WhatsApp, Slack, Linear.	”integration”
Eval	A structured assessment of a session against criteria.	”test,” “benchmark”
Org	The tenant. One org = one physically isolated deployment.	”team,” “company”

What `arpi spawn` does now

arpi spawn dev-agent

Resolve template dev-agent from the registry.
Authenticate the caller (identity domain).
Resolve credentials from template’s capabilities + caller’s identity.
Allocate compute (sandbox from pool, or local machine if no [compute]).
Provision workstation into compute (assemble settings, MCPs, skills, instructions).
Configure audit (eBPF, network mode, screen recording, cost tracking).
Start the worker (launch agent platform in the provisioned workstation).
Return a session (trackable via arpi status, streamable via arpi logs).

In eval mode (arpi spawn dev-agent --eval):

Load eval tasks from template or flags.
Run batch (parallel sessions, same template, scored independently).
Store trajectories and scores in observability.
Return eval results (per-session scores, aggregate stats, cost).

What this supersedes

spawn-model ADR: The TOML schema is replaced. [harness] becomes [worker]. [shared] and [platform.*] become [workstation]. [env] splits into [compute] and is removed as a concept. [env.secrets] is gone — credentials are inferred.
ontology ADR: The 5-domain model is replaced by 6 domains. definitions/ becomes templates/ in the registry. harnesses/ becomes workstation. environments/ splits. cli/ is no longer the runtime.
registries ADR: Separate MCP and skills registries merge into one unified registry.

The following ADRs remain valid and are not superseded:

identity-model (uid/euid model unchanged)
credential-lifecycle (three-tier model unchanged, credential inference is additive)
two-wall-security (gateway + sandbox architecture unchanged)
sandbox-strategy (OpenSandbox adoption unchanged)

Verification

Template schema supports all six sections, each independently optional
A template with only [worker] spawns on local machine with platform defaults
A template with only [compute] creates a bare sandbox with no capabilities
Credentials are inferred from mcps and skills via registry metadata
[overrides].credentials is the only place explicit credentials appear
[audit] defaults: kernel=true, network=passthrough, screen=auto, cost=true
[eval] is inactive unless --eval flag is passed or section is present with tasks
arpi spawn hits the control plane API, not local assembly
Partial templates compose via --workstation and --compute flags

Template schema and revised ontology

Template schema and revised ontology

Decision

Why templates, not definitions

Template schema

Section reference

Compute types

Compute location

Credential inference

Partial templates

Revised ontology

Domains

Key changes from the old ontology

Lexicon

What arpi spawn does now

What this supersedes

Verification

What `arpi spawn` does now