ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists
Status: Accepted
ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists
Section titled “ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists”Status: Accepted Date: 2026-03-26 Deciders: Alexandre Philippi
Context
Section titled “Context”Sandbox pods currently have unrestricted network access. An agent running inside a Kata VM can reach other sandboxes on the same cluster network, the Kubernetes API server, cloud metadata endpoints (169.254.169.254), and the open internet. This is unacceptable for a platform running untrusted agent code.
Specific threats:
- Sandbox-to-sandbox lateral movement — a compromised agent could scan the pod CIDR and attack other sandboxes
- Kubernetes API abuse — the default service account token is mounted in every pod; an agent could enumerate secrets, create pods, or escalate privileges
- Cloud metadata SSRF — on Azure, the IMDS endpoint exposes VM identity tokens and subscription metadata
- Unrestricted exfiltration — an agent could send stolen data to any external endpoint
We need network egress controls that are enforced at the CNI level (not bypassable by the agent), configurable per sandbox, and compatible with k3s + Flannel.
Decision
Section titled “Decision”Implement default-deny egress using Kubernetes NetworkPolicies, with three configurable tiers applied per sandbox at creation time. k3s ships Flannel as its CNI but supports NetworkPolicy enforcement via kube-router (enabled with --disable-network-policy flag removed and kube-router deployed, or by using k3s’s built-in kube-router integration).
OpenSandbox applies the appropriate NetworkPolicy when creating a sandbox pod, based on an egress-tier label.
Egress Tiers
Section titled “Egress Tiers”Tier 1: Restricted (default)
Section titled “Tier 1: Restricted (default)”The sandbox can only reach:
- vLLM (inference) — the agent needs LLM access to function
- OTel Collector (telemetry) — eBPF audit agent streams events here
- CoreDNS — required for service name resolution
Everything else is blocked: other sandboxes, the Kubernetes API, cloud metadata, and the internet.
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: sandbox-egress-restricted namespace: sandboxspec: podSelector: matchLabels: egress-tier: restricted policyTypes: - Egress egress: # DNS (CoreDNS) - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53 # vLLM inference - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: inference podSelector: matchLabels: app: vllm ports: - protocol: TCP port: 8000 # OTel Collector - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring podSelector: matchLabels: app: otel-collector ports: - protocol: TCP port: 4317 # OTLP gRPC - protocol: TCP port: 4318 # OTLP HTTPTier 2: Internet-Allowed
Section titled “Tier 2: Internet-Allowed”For agents that need to browse the web, install packages, clone repos, or call external APIs. The sandbox can reach the internet but still cannot reach other sandboxes or the Kubernetes API.
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: sandbox-egress-internet namespace: sandboxspec: podSelector: matchLabels: egress-tier: internet policyTypes: - Egress egress: # DNS - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53 # vLLM inference - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: inference podSelector: matchLabels: app: vllm ports: - protocol: TCP port: 8000 # OTel Collector - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring podSelector: matchLabels: app: otel-collector ports: - protocol: TCP port: 4317 - protocol: TCP port: 4318 # Internet (everything outside the cluster CIDR) # Block cluster-internal ranges, allow everything else - to: - ipBlock: cidr: 0.0.0.0/0 except: - 10.42.0.0/16 # k3s pod CIDR - 10.43.0.0/16 # k3s service CIDR - 169.254.169.254/32 # cloud metadataTier 3: Fully Isolated
Section titled “Tier 3: Fully Isolated”For evaluation, benchmarks, and testing where the agent should work entirely offline. The sandbox can only reach DNS (so that startup scripts that attempt resolution don’t hang indefinitely). No inference, no telemetry, no internet.
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: sandbox-egress-isolated namespace: sandboxspec: podSelector: matchLabels: egress-tier: isolated policyTypes: - Egress egress: # DNS only - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53How OpenSandbox Applies Policies
Section titled “How OpenSandbox Applies Policies”The three NetworkPolicy objects are deployed once as cluster manifests (in k3s/manifests/). They use podSelector with egress-tier labels, so they take effect automatically when a pod has the matching label.
OpenSandbox applies the tier by setting the egress-tier label on the sandbox pod at creation time:
- The
POST /sandboxesrequest includes an optionalegress_tierfield (restricted,internet,isolated). Default:restricted. - OpenSandbox’s Kubernetes provider sets
metadata.labels.egress-tieron the pod spec. - The matching NetworkPolicy takes effect immediately via kube-router.
No sidecar, no init container, no webhook — just labels and pre-deployed policies.
Interaction with Audit
Section titled “Interaction with Audit”Per ADR-001, the eBPF audit agent captures network metadata (destination IP, port, bytes, timing) regardless of which egress tier is active. The audit trail records both allowed and denied connection attempts:
- Allowed connections — eBPF captures the full
tcp_sendmsg/tcp_recvmsgflow - Denied connections — the agent’s
connect()syscall fails with ETIMEDOUT or ECONNREFUSED; eBPF captures the attempt viatracepoint/sys_enter_connect
The Tier 2 audit layer (mitmproxy, from ADR-001) runs inside the sandbox and is unaffected by egress policies. mitmproxy intercepts traffic before it leaves the pod, so NetworkPolicy rules apply to mitmproxy’s outbound connections (which go to the same allowed destinations the agent would reach directly).
Consequences
Section titled “Consequences”What This Prevents
Section titled “What This Prevents”- Sandbox-to-sandbox attacks — pod CIDR is blocked in all tiers except Restricted’s explicit allowlist (which only targets inference and monitoring pods, not other sandboxes)
- Kubernetes API abuse — the API server (10.43.0.1) is in the service CIDR, blocked by all tiers
- Cloud metadata SSRF — 169.254.169.254 is explicitly excluded in Internet-Allowed; in Restricted and Isolated it’s blocked by default-deny
- Lateral movement to monitoring/infra — only specific pods (vLLM, OTel Collector) on specific ports are reachable
What This Does Not Prevent
Section titled “What This Does Not Prevent”- Data exfiltration via allowed endpoints — a Restricted sandbox could encode stolen data in vLLM prompts; an Internet-Allowed sandbox could exfiltrate to any external server. Mitigation: eBPF audit captures all traffic metadata, mitmproxy (Tier 2 audit) captures full payloads.
- DNS exfiltration — all tiers allow DNS. An agent could encode data in DNS queries. Mitigation: DNS query logging via CoreDNS + OTel; rate limiting at CoreDNS if needed.
- Abuse of allowed external endpoints — an Internet-Allowed sandbox could attack external services. Mitigation: rate limiting at the node level (iptables), abuse reporting, sandbox attribution via audit trail.
Alternatives Considered
Section titled “Alternatives Considered”A. Calico NetworkPolicy
Section titled “A. Calico NetworkPolicy”Calico provides richer NetworkPolicy features (GlobalNetworkPolicy, DNS-based rules, application-layer policies). However, it replaces Flannel entirely, adding operational complexity. k3s’s built-in kube-router support is sufficient for our pod-selector and CIDR-based rules. If we need DNS-based allowlists (e.g., “allow *.github.com only”), Calico becomes worth revisiting.
B. Cilium
Section titled “B. Cilium”eBPF-native networking with the most powerful policy engine (L7 filtering, DNS-aware policies, transparent encryption). However, Cilium replaces both the CNI and kube-proxy, is significantly more complex to operate, and conflicts with our separate eBPF audit agent. If we outgrow kube-router’s capabilities, Cilium is the natural upgrade path.
C. OpenSandbox Egress Sidecar
Section titled “C. OpenSandbox Egress Sidecar”Run a per-sandbox proxy sidecar that enforces egress rules at the application layer. Rejected: adds resource overhead per sandbox (memory, CPU), introduces a bypass risk (agent could reach the network interface directly), and duplicates what NetworkPolicy already provides at the CNI level. NetworkPolicy is the correct layer for network-level access control.
D. iptables Rules Inside the Kata VM
Section titled “D. iptables Rules Inside the Kata VM”Apply egress rules inside the guest VM via iptables. Rejected: the agent runs as a process inside the VM and could potentially modify iptables rules (even with reduced privileges, there are escape paths). NetworkPolicy enforcement happens at the host CNI level, outside the VM — the agent cannot bypass it.