Worker Interaction Architecture
Status: Accepted
Worker Interaction Architecture
Section titled “Worker Interaction Architecture”Context
Section titled “Context”Workers run AI agents inside sandboxes. Users need to observe what agents are doing (tool calls, text output) and control them (approve/deny tool use). The system must work across sandbox network boundaries and support both interactive and autonomous modes.
Decision
Section titled “Decision”Sidecar pattern for event bridging
Section titled “Sidecar pattern for event bridging”The interaction sidecar (arpi-sidecar) runs inside the sandbox alongside the agent. It parses agent stdout through a pluggable adapter and emits normalized SDK events to the control plane via HTTP POST.
Why not NATS from sandbox: Exposing NATS to the sandbox network widens the trust boundary. The sidecar uses the same HTTP + bearer token pattern as the credential proxy, maintaining the two-wall security architecture.
Why sidecar, not agent SDK: A sidecar approach doesn’t require modifying agents. Any agent that writes to stdout can be bridged. The adapter pattern (ParseLine) handles format differences.
Renderer integrated in CLI binary
Section titled “Renderer integrated in CLI binary”The TUI renderer is built with bubbletea/lipgloss inside the arpi CLI binary, not as a separate Rust binary.
Why not separate Rust binary: The CLI already has bubbletea/lipgloss as dependencies. A Go TUI avoids a separate build/distribution pipeline. The performance characteristics of bubbletea are sufficient for event rendering.
Control channel via SDK event table
Section titled “Control channel via SDK event table”Control requests and responses are stored as SDK event types (control_request, control_response) in the existing sdk_events Postgres table, not a separate control_requests table.
Why reuse: The SDK event types already include control_request and control_response in the proto definition. The WebSocket streaming endpoint already delivers these events. Adding separate tables would duplicate persistence with no functional benefit.
Cursor-based polling: The sidecar polls GET /control/pending?after_id={last_seen} to get new responses. This prevents infinite re-delivery without requiring a delivered/consumed flag.
HTTP POST for sidecar event emission
Section titled “HTTP POST for sidecar event emission”The sidecar emits events via POST /v1/workers/{id}/sdk-events instead of publishing directly to NATS.
Why: The control plane’s HandleReportSDKEvent handler persists the event AND publishes to NATS. This gives us both storage and fan-out through a single HTTP call, and the sidecar only needs HTTP access (bearer token auth, same as cred-proxy).
Sidecar deployment in spawn pipeline
Section titled “Sidecar deployment in spawn pipeline”The sidecar binary is embedded in the server via //go:embed and uploaded to the sandbox during spawn (Step 6.5, after credential proxy). Config is written via the Upload API.
Why Upload for config: Shell-interpolating JSON into a printf command risks injection. The Upload API writes binary data directly to a file path, eliminating shell parsing.
Consequences
Section titled “Consequences”- Templates opt into interaction via
[sidecar]TOML section - Workers without
[sidecar]have no interaction capability (fire-and-forget) --rawPTY mode is blocked on OpenSandbox PTY support (issue filed: alibaba/OpenSandbox#623)- Adding new agent adapters requires adding a
ParseLineimplementation in the sidecar binary - The SDK event table grows with control events — acceptable since events are already append-only