Skip to content

Worker Interaction Architecture

Status: Accepted

Workers run AI agents inside sandboxes. Users need to observe what agents are doing (tool calls, text output) and control them (approve/deny tool use). The system must work across sandbox network boundaries and support both interactive and autonomous modes.

The interaction sidecar (arpi-sidecar) runs inside the sandbox alongside the agent. It parses agent stdout through a pluggable adapter and emits normalized SDK events to the control plane via HTTP POST.

Why not NATS from sandbox: Exposing NATS to the sandbox network widens the trust boundary. The sidecar uses the same HTTP + bearer token pattern as the credential proxy, maintaining the two-wall security architecture.

Why sidecar, not agent SDK: A sidecar approach doesn’t require modifying agents. Any agent that writes to stdout can be bridged. The adapter pattern (ParseLine) handles format differences.

The TUI renderer is built with bubbletea/lipgloss inside the arpi CLI binary, not as a separate Rust binary.

Why not separate Rust binary: The CLI already has bubbletea/lipgloss as dependencies. A Go TUI avoids a separate build/distribution pipeline. The performance characteristics of bubbletea are sufficient for event rendering.

Control requests and responses are stored as SDK event types (control_request, control_response) in the existing sdk_events Postgres table, not a separate control_requests table.

Why reuse: The SDK event types already include control_request and control_response in the proto definition. The WebSocket streaming endpoint already delivers these events. Adding separate tables would duplicate persistence with no functional benefit.

Cursor-based polling: The sidecar polls GET /control/pending?after_id={last_seen} to get new responses. This prevents infinite re-delivery without requiring a delivered/consumed flag.

The sidecar emits events via POST /v1/workers/{id}/sdk-events instead of publishing directly to NATS.

Why: The control plane’s HandleReportSDKEvent handler persists the event AND publishes to NATS. This gives us both storage and fan-out through a single HTTP call, and the sidecar only needs HTTP access (bearer token auth, same as cred-proxy).

The sidecar binary is embedded in the server via //go:embed and uploaded to the sandbox during spawn (Step 6.5, after credential proxy). Config is written via the Upload API.

Why Upload for config: Shell-interpolating JSON into a printf command risks injection. The Upload API writes binary data directly to a file path, eliminating shell parsing.

  • Templates opt into interaction via [sidecar] TOML section
  • Workers without [sidecar] have no interaction capability (fire-and-forget)
  • --raw PTY mode is blocked on OpenSandbox PTY support (issue filed: alibaba/OpenSandbox#623)
  • Adding new agent adapters requires adding a ParseLine implementation in the sidecar binary
  • The SDK event table grows with control events — acceptable since events are already append-only