Two-wall security architecture

Status: Accepted

Two-wall security architecture

Decision

All worker sandboxes are secured by two hard walls. Advisory controls (hooks, settings, CLAUDE.md) are defense-in-depth, not enforcement.

Wall 1 — Network gateway (Bifrost). All MCP tool calls and LLM API calls route through Bifrost. The gateway holds real credentials, enforces tool-level RBAC, rate limits, and provides audit logging. The agent receives only a scoped virtual key — it never sees real API keys.

Wall 2 — Kernel sandbox (OpenSandbox). Agent processes run in isolated containers (gVisor/Kata/Firecracker backends via OpenSandbox). Syscall interception, filesystem restrictions, and egress filtering prevent exfiltration. The credential proxy injects secrets into the sandbox environment without the secrets ever entering the container filesystem or process env directly.

Advisory layer. Hooks, settings.json, permission modes, and CLAUDE.md instructions are the outermost layer. They catch mistakes and improve UX but are explicitly not trusted as enforcement. A sufficiently capable agent can bypass all advisory controls.

Hard (can't bypass)              Soft (can bypass)
---------------------            ---------------------
Wall 1: Gateway (Bifrost)        Agent permission modes
Wall 2: Sandbox (OpenSandbox)    PreToolUse hooks
                                 settings.json rules
                                 CLAUDE.md instructions

Why

Advisory controls alone are not enforceable. The 2026 incident where Claude Code extracted secrets via docker compose config after being blocked from .env demonstrated that hooks and permission modes can be circumvented. Only network interception and kernel isolation provide real security boundaries.

Credential proxy

Secrets never enter the sandbox directly. The credential proxy (part of our OpenSandbox contributions) sits between the IAM provider and the sandbox:

Agent requests a secret via the proxy API inside the sandbox.
Proxy authenticates the request against the agent’s euid and the definition’s allowed secrets.
Proxy fetches from the IAM provider and injects into the sandbox’s isolated env namespace.
The secret is available to the agent process but cannot be exfiltrated — egress is filtered to only the gateway.

This is stronger than env var injection because the proxy can revoke access mid-session and audit every access.

Implementation

Bifrost intercepts all outbound network calls from agent processes. Agent gets a virtual key to the gateway, never real credentials.
OpenSandbox (our fork with Apple Container backend, credential proxy, policy engine) restricts filesystem to the mounted project directory, blocks egress to everything except the gateway endpoint, and intercepts syscalls.
arpi spawn in sandbox mode configures both walls automatically. In bare mode, neither wall is active — the developer’s own machine is the trust boundary.

Verification

Agent inside sandbox cannot read host filesystem outside the project mount
Agent inside sandbox cannot curl external services directly (only via gateway)
Secrets are injected via credential proxy, never written to disk
Gateway audit log captures: uid, euid, tool called, timestamp
Revoking a machine identity immediately blocks all active sessions
Hooks remain configured as first-line defense but are not relied upon for security