Skip to content

ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists

Status: Accepted

ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists

Section titled “ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists”

Status: Accepted Date: 2026-03-26 Deciders: Alexandre Philippi

Sandbox pods currently have unrestricted network access. An agent running inside a Kata VM can reach other sandboxes on the same cluster network, the Kubernetes API server, cloud metadata endpoints (169.254.169.254), and the open internet. This is unacceptable for a platform running untrusted agent code.

Specific threats:

  • Sandbox-to-sandbox lateral movement — a compromised agent could scan the pod CIDR and attack other sandboxes
  • Kubernetes API abuse — the default service account token is mounted in every pod; an agent could enumerate secrets, create pods, or escalate privileges
  • Cloud metadata SSRF — on Azure, the IMDS endpoint exposes VM identity tokens and subscription metadata
  • Unrestricted exfiltration — an agent could send stolen data to any external endpoint

We need network egress controls that are enforced at the CNI level (not bypassable by the agent), configurable per sandbox, and compatible with k3s + Flannel.

Implement default-deny egress using Kubernetes NetworkPolicies, with three configurable tiers applied per sandbox at creation time. k3s ships Flannel as its CNI but supports NetworkPolicy enforcement via kube-router (enabled with --disable-network-policy flag removed and kube-router deployed, or by using k3s’s built-in kube-router integration).

OpenSandbox applies the appropriate NetworkPolicy when creating a sandbox pod, based on an egress-tier label.

The sandbox can only reach:

  • vLLM (inference) — the agent needs LLM access to function
  • OTel Collector (telemetry) — eBPF audit agent streams events here
  • CoreDNS — required for service name resolution

Everything else is blocked: other sandboxes, the Kubernetes API, cloud metadata, and the internet.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sandbox-egress-restricted
namespace: sandbox
spec:
podSelector:
matchLabels:
egress-tier: restricted
policyTypes:
- Egress
egress:
# DNS (CoreDNS)
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# vLLM inference
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: inference
podSelector:
matchLabels:
app: vllm
ports:
- protocol: TCP
port: 8000
# OTel Collector
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
podSelector:
matchLabels:
app: otel-collector
ports:
- protocol: TCP
port: 4317 # OTLP gRPC
- protocol: TCP
port: 4318 # OTLP HTTP

For agents that need to browse the web, install packages, clone repos, or call external APIs. The sandbox can reach the internet but still cannot reach other sandboxes or the Kubernetes API.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sandbox-egress-internet
namespace: sandbox
spec:
podSelector:
matchLabels:
egress-tier: internet
policyTypes:
- Egress
egress:
# DNS
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# vLLM inference
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: inference
podSelector:
matchLabels:
app: vllm
ports:
- protocol: TCP
port: 8000
# OTel Collector
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
podSelector:
matchLabels:
app: otel-collector
ports:
- protocol: TCP
port: 4317
- protocol: TCP
port: 4318
# Internet (everything outside the cluster CIDR)
# Block cluster-internal ranges, allow everything else
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.42.0.0/16 # k3s pod CIDR
- 10.43.0.0/16 # k3s service CIDR
- 169.254.169.254/32 # cloud metadata

For evaluation, benchmarks, and testing where the agent should work entirely offline. The sandbox can only reach DNS (so that startup scripts that attempt resolution don’t hang indefinitely). No inference, no telemetry, no internet.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sandbox-egress-isolated
namespace: sandbox
spec:
podSelector:
matchLabels:
egress-tier: isolated
policyTypes:
- Egress
egress:
# DNS only
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53

The three NetworkPolicy objects are deployed once as cluster manifests (in k3s/manifests/). They use podSelector with egress-tier labels, so they take effect automatically when a pod has the matching label.

OpenSandbox applies the tier by setting the egress-tier label on the sandbox pod at creation time:

  1. The POST /sandboxes request includes an optional egress_tier field (restricted, internet, isolated). Default: restricted.
  2. OpenSandbox’s Kubernetes provider sets metadata.labels.egress-tier on the pod spec.
  3. The matching NetworkPolicy takes effect immediately via kube-router.

No sidecar, no init container, no webhook — just labels and pre-deployed policies.

Per ADR-001, the eBPF audit agent captures network metadata (destination IP, port, bytes, timing) regardless of which egress tier is active. The audit trail records both allowed and denied connection attempts:

  • Allowed connections — eBPF captures the full tcp_sendmsg/tcp_recvmsg flow
  • Denied connections — the agent’s connect() syscall fails with ETIMEDOUT or ECONNREFUSED; eBPF captures the attempt via tracepoint/sys_enter_connect

The Tier 2 audit layer (mitmproxy, from ADR-001) runs inside the sandbox and is unaffected by egress policies. mitmproxy intercepts traffic before it leaves the pod, so NetworkPolicy rules apply to mitmproxy’s outbound connections (which go to the same allowed destinations the agent would reach directly).

  • Sandbox-to-sandbox attacks — pod CIDR is blocked in all tiers except Restricted’s explicit allowlist (which only targets inference and monitoring pods, not other sandboxes)
  • Kubernetes API abuse — the API server (10.43.0.1) is in the service CIDR, blocked by all tiers
  • Cloud metadata SSRF — 169.254.169.254 is explicitly excluded in Internet-Allowed; in Restricted and Isolated it’s blocked by default-deny
  • Lateral movement to monitoring/infra — only specific pods (vLLM, OTel Collector) on specific ports are reachable
  • Data exfiltration via allowed endpoints — a Restricted sandbox could encode stolen data in vLLM prompts; an Internet-Allowed sandbox could exfiltrate to any external server. Mitigation: eBPF audit captures all traffic metadata, mitmproxy (Tier 2 audit) captures full payloads.
  • DNS exfiltration — all tiers allow DNS. An agent could encode data in DNS queries. Mitigation: DNS query logging via CoreDNS + OTel; rate limiting at CoreDNS if needed.
  • Abuse of allowed external endpoints — an Internet-Allowed sandbox could attack external services. Mitigation: rate limiting at the node level (iptables), abuse reporting, sandbox attribution via audit trail.

Calico provides richer NetworkPolicy features (GlobalNetworkPolicy, DNS-based rules, application-layer policies). However, it replaces Flannel entirely, adding operational complexity. k3s’s built-in kube-router support is sufficient for our pod-selector and CIDR-based rules. If we need DNS-based allowlists (e.g., “allow *.github.com only”), Calico becomes worth revisiting.

eBPF-native networking with the most powerful policy engine (L7 filtering, DNS-aware policies, transparent encryption). However, Cilium replaces both the CNI and kube-proxy, is significantly more complex to operate, and conflicts with our separate eBPF audit agent. If we outgrow kube-router’s capabilities, Cilium is the natural upgrade path.

Run a per-sandbox proxy sidecar that enforces egress rules at the application layer. Rejected: adds resource overhead per sandbox (memory, CPU), introduces a bypass risk (agent could reach the network interface directly), and duplicates what NetworkPolicy already provides at the CNI level. NetworkPolicy is the correct layer for network-level access control.

Apply egress rules inside the guest VM via iptables. Rejected: the agent runs as a process inside the VM and could potentially modify iptables rules (even with reduced privileges, there are escape paths). NetworkPolicy enforcement happens at the host CNI level, outside the VM — the agent cannot bypass it.