Worker Interaction Architecture

Status: Accepted

Worker Interaction Architecture

Context

Workers run AI agents inside sandboxes. Users need to observe what agents are doing (tool calls, text output) and control them (approve/deny tool use). The system must work across sandbox network boundaries and support both interactive and autonomous modes.

Decision

Sidecar pattern for event bridging

The interaction sidecar (arpi-sidecar) runs inside the sandbox alongside the agent. It parses agent stdout through a pluggable adapter and emits normalized SDK events to the control plane via HTTP POST.

Why not NATS from sandbox: Exposing NATS to the sandbox network widens the trust boundary. The sidecar uses the same HTTP + bearer token pattern as the credential proxy, maintaining the two-wall security architecture.

Why sidecar, not agent SDK: A sidecar approach doesn’t require modifying agents. Any agent that writes to stdout can be bridged. The adapter pattern (ParseLine) handles format differences.

Renderer integrated in CLI binary

The TUI renderer is built with bubbletea/lipgloss inside the arpi CLI binary, not as a separate Rust binary.

Why not separate Rust binary: The CLI already has bubbletea/lipgloss as dependencies. A Go TUI avoids a separate build/distribution pipeline. The performance characteristics of bubbletea are sufficient for event rendering.

Control channel via SDK event table

Control requests and responses are stored as SDK event types (control_request, control_response) in the existing sdk_events Postgres table, not a separate control_requests table.

Why reuse: The SDK event types already include control_request and control_response in the proto definition. The WebSocket streaming endpoint already delivers these events. Adding separate tables would duplicate persistence with no functional benefit.

Cursor-based polling: The sidecar polls GET /control/pending?after_id={last_seen} to get new responses. This prevents infinite re-delivery without requiring a delivered/consumed flag.

HTTP POST for sidecar event emission

The sidecar emits events via POST /v1/workers/{id}/sdk-events instead of publishing directly to NATS.

Why: The control plane’s HandleReportSDKEvent handler persists the event AND publishes to NATS. This gives us both storage and fan-out through a single HTTP call, and the sidecar only needs HTTP access (bearer token auth, same as cred-proxy).

Sidecar deployment in spawn pipeline

The sidecar binary is embedded in the server via //go:embed and uploaded to the sandbox during spawn (Step 6.5, after credential proxy). Config is written via the Upload API.

Why Upload for config: Shell-interpolating JSON into a printf command risks injection. The Upload API writes binary data directly to a file path, eliminating shell parsing.

Consequences

Templates opt into interaction via [sidecar] TOML section
Workers without [sidecar] have no interaction capability (fire-and-forget)
--raw PTY mode is blocked on OpenSandbox PTY support (issue filed: alibaba/OpenSandbox#623)
Adding new agent adapters requires adding a ParseLine implementation in the sidecar binary
The SDK event table grows with control events — acceptable since events are already append-only