Skip to content

ADR-003: Control plane architecture

Status: Proposed

arpi is a platform with an API. The CLI is a thin client. The control plane server needs a Go HTTP framework and API style. This is the most load-bearing architectural decision for Phase 1 — every domain (Identity, Workstation, Compute, Registry, Connectivity, Observability) exposes functionality through this API.

RequirementPriorityNotes
Go-nativeP0arpi is Go. No polyglot runtime.
REST-compatibleP0CLI, dashboard, and external integrations consume REST
Streaming supportP0GET /v1/sessions/:id/logs needs WebSocket or SSE
Generated clientsP1Go SDK, TypeScript SDK (dashboard), future Python SDK
gRPC optionP2Agent-to-agent and internal service calls may benefit from gRPC
Middleware chainP0Auth, logging, rate limiting, CORS
Low dependency footprintP0arpi avoids large dep trees (see CLAUDE.md: no Viper, no koanf)
OpenAPI specP1Auto-generated from routes for documentation
  • Small team (1-2 engineers). Framework must not require ceremony.
  • The API surface is moderate: ~15 endpoints in Phase 1 (see TASKS.md CP-01..CP-12).
  • WebSocket required for log streaming. SSE is acceptable alternative.
  • Internal services (sandbox orchestrator, gateway manager) may use gRPC eventually.

Go 1.22+ stdlib with pattern matching (GET /v1/sessions/{id}).

AspectAssessment
RoutingGo 1.22 ServeMux with method+path patterns. Covers all Phase 1 routes.
MiddlewareManual wrapping. func(http.Handler) http.Handler pattern.
WebSocketnhooyr.io/websocket or gorilla/websocket.
Generated clientsNone. Write OpenAPI spec manually, generate from it.
DependenciesZero (stdlib). WebSocket lib is the only addition.
gRPCNot supported. Would need separate gRPC server on different port.
Learning curveNone — it’s stdlib.

Lightweight router. Most popular Go HTTP router after stdlib.

AspectAssessment
RoutingURL params, method routing, route groups. Superset of stdlib.
MiddlewareBuilt-in middleware library (logging, recovery, CORS, timeout).
WebSocketSame as stdlib — bring your own WebSocket lib.
Generated clientsNone. Same OpenAPI story as stdlib.
DependenciesMinimal (chi is ~2K LOC, no transitive deps).
gRPCNot supported natively. Separate server.
Learning curve10 minutes. API is almost identical to stdlib.

gRPC-compatible RPC framework that also serves HTTP/JSON.

AspectAssessment
RoutingDefined by Protobuf service definitions.
MiddlewareInterceptors (unary + streaming). Equivalent to middleware.
WebSocketStreaming RPCs provide bidirectional streaming natively.
Generated clientsYes. buf generate produces Go, TypeScript, Python clients.
DependenciesProtobuf toolchain (buf), connect-go runtime. Moderate dep tree.
gRPCNative. Connect protocol serves gRPC, gRPC-Web, and HTTP/JSON on one port.
Learning curveModerate. Requires Protobuf fluency and buf toolchain setup.
POST /v1/sessions — spawn a worker
GET /v1/sessions — list active sessions
GET /v1/sessions/:id — session detail
DELETE /v1/sessions/:id — teardown
GET /v1/sessions/:id/logs — streaming logs (WebSocket)
GET /v1/status — health check

Plus auth middleware, structured error responses, and request logging.

This is a straightforward REST API. No complex RPC patterns. No bidirectional streaming beyond log tailing.

Connect-rpc’s main advantage is generated clients. But:

  • The Go SDK is one file wrapping net/http calls — hand-written in 30 minutes.
  • The TypeScript SDK (for the dashboard) can be generated from an OpenAPI spec.
  • The Phase 1 API has 6 endpoints. Code generation saves minutes, not hours.

Generated clients become valuable at 30+ endpoints (Phase 4+). Not Phase 1.

Agent-to-agent messaging (CN-07) and event bus (CN-08) are Phase 3. They might use gRPC or NATS — that decision is months away. Adding Protobuf toolchain now for a future maybe is exactly the kind of speculative complexity CLAUDE.md warns against.

Stdlib middleware is functional but manual. chi’s middleware library covers the common cases (structured logging, panic recovery, CORS, request ID, timeout) without external deps. This saves ~200 lines of boilerplate.

The “start simple, upgrade later” question

Section titled “The “start simple, upgrade later” question”

chi → connect-rpc migration is straightforward because:

  1. Route handlers are http.Handler — they work in both
  2. Connect-rpc serves on net/http — chi can mount connect handlers as subroutes
  3. Individual endpoints can be migrated incrementally (REST → connect-rpc)

The reverse (connect-rpc → chi) is harder because it means throwing away proto definitions.

chi — with connect-rpc as an incremental upgrade path for Phase 3+ when generated clients and gRPC become valuable.

  1. Phase 1 is 6 REST endpoints. chi handles this with zero ceremony. Connect-rpc’s Protobuf toolchain is overhead for a 6-endpoint API.
  2. Built-in middleware library. Logging, recovery, CORS, timeout — all included. Saves ~200 lines vs stdlib.
  3. Near-zero learning curve. chi’s API is stdlib-compatible. r.Get("/path", handler) — that’s it.
  4. WebSocket via gorilla/websocket. Mount a WebSocket handler at /v1/sessions/{id}/logs. No framework friction.
  5. Incremental connect-rpc migration. When Phase 3 needs gRPC (agent messaging, internal services), mount connect-rpc handlers alongside chi routes. Both run on net/http.
  6. Minimal dependencies. chi has zero transitive deps. Aligns with arpi’s dependency philosophy.
// Standard error response
type APIError struct {
Code string `json:"code"` // machine-readable: "not_found", "unauthorized"
Message string `json:"message"` // human-readable
Details any `json:"details,omitempty"` // optional structured data
}
// Standard list response
type ListResponse[T any] struct {
Items []T `json:"items"`
Total int `json:"total"`
}
  • All endpoints return JSON with Content-Type: application/json
  • Errors use HTTP status codes + APIError body
  • List endpoints support ?limit=N&offset=M pagination
  • Versioned: /v1/ prefix on all routes
api/server.go
r := chi.NewRouter()
r.Use(middleware.RequestID)
r.Use(middleware.RealIP)
r.Use(middleware.Logger) // structured slog
r.Use(middleware.Recoverer)
r.Use(cors.Handler(cors.Options{...}))
r.Use(authMiddleware) // JWT validation via Ory Hydra OIDC
r.Route("/v1", func(r chi.Router) {
r.Get("/status", handleStatus)
r.Route("/sessions", func(r chi.Router) {
r.Post("/", handleCreateSession) // spawn
r.Get("/", handleListSessions)
r.Route("/{id}", func(r chi.Router) {
r.Get("/", handleGetSession)
r.Delete("/", handleDeleteSession) // stop
r.Get("/logs", handleStreamLogs) // WebSocket upgrade
})
})
})

Upgrade to connect-rpc when:

  • API surface exceeds 20 endpoints (Phase 3-4)
  • Agent-to-agent messaging needs gRPC streaming
  • TypeScript SDK generation becomes a bottleneck
  • Internal services need service-to-service RPC
  • POST /v1/sessions with valid JWT creates a worker session
  • GET /v1/sessions returns list with pagination
  • DELETE /v1/sessions/:id tears down session and cleans up resources
  • GET /v1/sessions/:id/logs upgrades to WebSocket and streams container logs
  • GET /v1/status returns 200 with health info
  • Invalid/missing JWT returns 401 with APIError
  • Unknown routes return 404 with APIError
  • Panic in handler returns 500 (middleware.Recoverer) without crashing server