ADR-003: Control plane architecture
Status: Proposed
ADR-003: Control plane architecture
Section titled “ADR-003: Control plane architecture”Context
Section titled “Context”arpi is a platform with an API. The CLI is a thin client. The control plane server needs a Go HTTP framework and API style. This is the most load-bearing architectural decision for Phase 1 — every domain (Identity, Workstation, Compute, Registry, Connectivity, Observability) exposes functionality through this API.
Requirements
Section titled “Requirements”| Requirement | Priority | Notes |
|---|---|---|
| Go-native | P0 | arpi is Go. No polyglot runtime. |
| REST-compatible | P0 | CLI, dashboard, and external integrations consume REST |
| Streaming support | P0 | GET /v1/sessions/:id/logs needs WebSocket or SSE |
| Generated clients | P1 | Go SDK, TypeScript SDK (dashboard), future Python SDK |
| gRPC option | P2 | Agent-to-agent and internal service calls may benefit from gRPC |
| Middleware chain | P0 | Auth, logging, rate limiting, CORS |
| Low dependency footprint | P0 | arpi avoids large dep trees (see CLAUDE.md: no Viper, no koanf) |
| OpenAPI spec | P1 | Auto-generated from routes for documentation |
Constraints
Section titled “Constraints”- Small team (1-2 engineers). Framework must not require ceremony.
- The API surface is moderate: ~15 endpoints in Phase 1 (see TASKS.md CP-01..CP-12).
- WebSocket required for log streaming. SSE is acceptable alternative.
- Internal services (sandbox orchestrator, gateway manager) may use gRPC eventually.
Options
Section titled “Options”Option A: net/http (stdlib)
Section titled “Option A: net/http (stdlib)”Go 1.22+ stdlib with pattern matching (GET /v1/sessions/{id}).
| Aspect | Assessment |
|---|---|
| Routing | Go 1.22 ServeMux with method+path patterns. Covers all Phase 1 routes. |
| Middleware | Manual wrapping. func(http.Handler) http.Handler pattern. |
| WebSocket | nhooyr.io/websocket or gorilla/websocket. |
| Generated clients | None. Write OpenAPI spec manually, generate from it. |
| Dependencies | Zero (stdlib). WebSocket lib is the only addition. |
| gRPC | Not supported. Would need separate gRPC server on different port. |
| Learning curve | None — it’s stdlib. |
Option B: go-chi/chi
Section titled “Option B: go-chi/chi”Lightweight router. Most popular Go HTTP router after stdlib.
| Aspect | Assessment |
|---|---|
| Routing | URL params, method routing, route groups. Superset of stdlib. |
| Middleware | Built-in middleware library (logging, recovery, CORS, timeout). |
| WebSocket | Same as stdlib — bring your own WebSocket lib. |
| Generated clients | None. Same OpenAPI story as stdlib. |
| Dependencies | Minimal (chi is ~2K LOC, no transitive deps). |
| gRPC | Not supported natively. Separate server. |
| Learning curve | 10 minutes. API is almost identical to stdlib. |
Option C: connectrpc/connect-go
Section titled “Option C: connectrpc/connect-go”gRPC-compatible RPC framework that also serves HTTP/JSON.
| Aspect | Assessment |
|---|---|
| Routing | Defined by Protobuf service definitions. |
| Middleware | Interceptors (unary + streaming). Equivalent to middleware. |
| WebSocket | Streaming RPCs provide bidirectional streaming natively. |
| Generated clients | Yes. buf generate produces Go, TypeScript, Python clients. |
| Dependencies | Protobuf toolchain (buf), connect-go runtime. Moderate dep tree. |
| gRPC | Native. Connect protocol serves gRPC, gRPC-Web, and HTTP/JSON on one port. |
| Learning curve | Moderate. Requires Protobuf fluency and buf toolchain setup. |
Analysis
Section titled “Analysis”What Phase 1 actually needs
Section titled “What Phase 1 actually needs”POST /v1/sessions — spawn a workerGET /v1/sessions — list active sessionsGET /v1/sessions/:id — session detailDELETE /v1/sessions/:id — teardownGET /v1/sessions/:id/logs — streaming logs (WebSocket)GET /v1/status — health checkPlus auth middleware, structured error responses, and request logging.
This is a straightforward REST API. No complex RPC patterns. No bidirectional streaming beyond log tailing.
Generated clients: how valuable?
Section titled “Generated clients: how valuable?”Connect-rpc’s main advantage is generated clients. But:
- The Go SDK is one file wrapping
net/httpcalls — hand-written in 30 minutes. - The TypeScript SDK (for the dashboard) can be generated from an OpenAPI spec.
- The Phase 1 API has 6 endpoints. Code generation saves minutes, not hours.
Generated clients become valuable at 30+ endpoints (Phase 4+). Not Phase 1.
gRPC: premature?
Section titled “gRPC: premature?”Agent-to-agent messaging (CN-07) and event bus (CN-08) are Phase 3. They might use gRPC or NATS — that decision is months away. Adding Protobuf toolchain now for a future maybe is exactly the kind of speculative complexity CLAUDE.md warns against.
Middleware: chi wins on convenience
Section titled “Middleware: chi wins on convenience”Stdlib middleware is functional but manual. chi’s middleware library covers the common cases (structured logging, panic recovery, CORS, request ID, timeout) without external deps. This saves ~200 lines of boilerplate.
The “start simple, upgrade later” question
Section titled “The “start simple, upgrade later” question”chi → connect-rpc migration is straightforward because:
- Route handlers are
http.Handler— they work in both - Connect-rpc serves on
net/http— chi can mount connect handlers as subroutes - Individual endpoints can be migrated incrementally (REST → connect-rpc)
The reverse (connect-rpc → chi) is harder because it means throwing away proto definitions.
Decision
Section titled “Decision”chi — with connect-rpc as an incremental upgrade path for Phase 3+ when generated clients and gRPC become valuable.
Rationale
Section titled “Rationale”- Phase 1 is 6 REST endpoints. chi handles this with zero ceremony. Connect-rpc’s Protobuf toolchain is overhead for a 6-endpoint API.
- Built-in middleware library. Logging, recovery, CORS, timeout — all included. Saves ~200 lines vs stdlib.
- Near-zero learning curve. chi’s API is stdlib-compatible.
r.Get("/path", handler)— that’s it. - WebSocket via gorilla/websocket. Mount a WebSocket handler at
/v1/sessions/{id}/logs. No framework friction. - Incremental connect-rpc migration. When Phase 3 needs gRPC (agent messaging, internal services), mount connect-rpc handlers alongside chi routes. Both run on
net/http. - Minimal dependencies. chi has zero transitive deps. Aligns with arpi’s dependency philosophy.
API conventions
Section titled “API conventions”// Standard error responsetype APIError struct { Code string `json:"code"` // machine-readable: "not_found", "unauthorized" Message string `json:"message"` // human-readable Details any `json:"details,omitempty"` // optional structured data}
// Standard list responsetype ListResponse[T any] struct { Items []T `json:"items"` Total int `json:"total"`}- All endpoints return JSON with
Content-Type: application/json - Errors use HTTP status codes +
APIErrorbody - List endpoints support
?limit=N&offset=Mpagination - Versioned:
/v1/prefix on all routes
Implementation plan
Section titled “Implementation plan”r := chi.NewRouter()r.Use(middleware.RequestID)r.Use(middleware.RealIP)r.Use(middleware.Logger) // structured slogr.Use(middleware.Recoverer)r.Use(cors.Handler(cors.Options{...}))r.Use(authMiddleware) // JWT validation via Ory Hydra OIDC
r.Route("/v1", func(r chi.Router) { r.Get("/status", handleStatus) r.Route("/sessions", func(r chi.Router) { r.Post("/", handleCreateSession) // spawn r.Get("/", handleListSessions) r.Route("/{id}", func(r chi.Router) { r.Get("/", handleGetSession) r.Delete("/", handleDeleteSession) // stop r.Get("/logs", handleStreamLogs) // WebSocket upgrade }) })})When to revisit
Section titled “When to revisit”Upgrade to connect-rpc when:
- API surface exceeds 20 endpoints (Phase 3-4)
- Agent-to-agent messaging needs gRPC streaming
- TypeScript SDK generation becomes a bottleneck
- Internal services need service-to-service RPC
Verification
Section titled “Verification”POST /v1/sessionswith valid JWT creates a worker sessionGET /v1/sessionsreturns list with paginationDELETE /v1/sessions/:idtears down session and cleans up resourcesGET /v1/sessions/:id/logsupgrades to WebSocket and streams container logsGET /v1/statusreturns 200 with health info- Invalid/missing JWT returns 401 with
APIError - Unknown routes return 404 with
APIError - Panic in handler returns 500 (middleware.Recoverer) without crashing server