ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists

Status: Accepted

ADR-004: Network Egress Policy — Default-Deny with Per-Sandbox Allowlists

Status: Accepted Date: 2026-03-26 Deciders: Alexandre Philippi

Context

Sandbox pods currently have unrestricted network access. An agent running inside a Kata VM can reach other sandboxes on the same cluster network, the Kubernetes API server, cloud metadata endpoints (169.254.169.254), and the open internet. This is unacceptable for a platform running untrusted agent code.

Specific threats:

Sandbox-to-sandbox lateral movement — a compromised agent could scan the pod CIDR and attack other sandboxes
Kubernetes API abuse — the default service account token is mounted in every pod; an agent could enumerate secrets, create pods, or escalate privileges
Cloud metadata SSRF — on Azure, the IMDS endpoint exposes VM identity tokens and subscription metadata
Unrestricted exfiltration — an agent could send stolen data to any external endpoint

We need network egress controls that are enforced at the CNI level (not bypassable by the agent), configurable per sandbox, and compatible with k3s + Flannel.

Decision

Implement default-deny egress using Kubernetes NetworkPolicies, with three configurable tiers applied per sandbox at creation time. k3s ships Flannel as its CNI but supports NetworkPolicy enforcement via kube-router (enabled with --disable-network-policy flag removed and kube-router deployed, or by using k3s’s built-in kube-router integration).

OpenSandbox applies the appropriate NetworkPolicy when creating a sandbox pod, based on an egress-tier label.

Egress Tiers

Tier 1: Restricted (default)

The sandbox can only reach:

vLLM (inference) — the agent needs LLM access to function
OTel Collector (telemetry) — eBPF audit agent streams events here
CoreDNS — required for service name resolution

Everything else is blocked: other sandboxes, the Kubernetes API, cloud metadata, and the internet.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sandbox-egress-restricted
  namespace: sandbox
spec:
  podSelector:
    matchLabels:
      egress-tier: restricted
  policyTypes:
    - Egress
  egress:
    # DNS (CoreDNS)
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
    # vLLM inference
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: inference
          podSelector:
            matchLabels:
              app: vllm
      ports:
        - protocol: TCP
          port: 8000
    # OTel Collector
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app: otel-collector
      ports:
        - protocol: TCP
          port: 4317  # OTLP gRPC
        - protocol: TCP
          port: 4318  # OTLP HTTP

Tier 2: Internet-Allowed

For agents that need to browse the web, install packages, clone repos, or call external APIs. The sandbox can reach the internet but still cannot reach other sandboxes or the Kubernetes API.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sandbox-egress-internet
  namespace: sandbox
spec:
  podSelector:
    matchLabels:
      egress-tier: internet
  policyTypes:
    - Egress
  egress:
    # DNS
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
    # vLLM inference
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: inference
          podSelector:
            matchLabels:
              app: vllm
      ports:
        - protocol: TCP
          port: 8000
    # OTel Collector
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app: otel-collector
      ports:
        - protocol: TCP
          port: 4317
        - protocol: TCP
          port: 4318
    # Internet (everything outside the cluster CIDR)
    # Block cluster-internal ranges, allow everything else
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 10.42.0.0/16   # k3s pod CIDR
              - 10.43.0.0/16   # k3s service CIDR
              - 169.254.169.254/32  # cloud metadata

Tier 3: Fully Isolated

For evaluation, benchmarks, and testing where the agent should work entirely offline. The sandbox can only reach DNS (so that startup scripts that attempt resolution don’t hang indefinitely). No inference, no telemetry, no internet.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sandbox-egress-isolated
  namespace: sandbox
spec:
  podSelector:
    matchLabels:
      egress-tier: isolated
  policyTypes:
    - Egress
  egress:
    # DNS only
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

How OpenSandbox Applies Policies

The three NetworkPolicy objects are deployed once as cluster manifests (in k3s/manifests/). They use podSelector with egress-tier labels, so they take effect automatically when a pod has the matching label.

OpenSandbox applies the tier by setting the egress-tier label on the sandbox pod at creation time:

The POST /sandboxes request includes an optional egress_tier field (restricted, internet, isolated). Default: restricted.
OpenSandbox’s Kubernetes provider sets metadata.labels.egress-tier on the pod spec.
The matching NetworkPolicy takes effect immediately via kube-router.

No sidecar, no init container, no webhook — just labels and pre-deployed policies.

Interaction with Audit

Per ADR-001, the eBPF audit agent captures network metadata (destination IP, port, bytes, timing) regardless of which egress tier is active. The audit trail records both allowed and denied connection attempts:

Allowed connections — eBPF captures the full tcp_sendmsg/tcp_recvmsg flow
Denied connections — the agent’s connect() syscall fails with ETIMEDOUT or ECONNREFUSED; eBPF captures the attempt via tracepoint/sys_enter_connect

The Tier 2 audit layer (mitmproxy, from ADR-001) runs inside the sandbox and is unaffected by egress policies. mitmproxy intercepts traffic before it leaves the pod, so NetworkPolicy rules apply to mitmproxy’s outbound connections (which go to the same allowed destinations the agent would reach directly).

Consequences

What This Prevents

Sandbox-to-sandbox attacks — pod CIDR is blocked in all tiers except Restricted’s explicit allowlist (which only targets inference and monitoring pods, not other sandboxes)
Kubernetes API abuse — the API server (10.43.0.1) is in the service CIDR, blocked by all tiers
Cloud metadata SSRF — 169.254.169.254 is explicitly excluded in Internet-Allowed; in Restricted and Isolated it’s blocked by default-deny
Lateral movement to monitoring/infra — only specific pods (vLLM, OTel Collector) on specific ports are reachable

What This Does Not Prevent

Data exfiltration via allowed endpoints — a Restricted sandbox could encode stolen data in vLLM prompts; an Internet-Allowed sandbox could exfiltrate to any external server. Mitigation: eBPF audit captures all traffic metadata, mitmproxy (Tier 2 audit) captures full payloads.
DNS exfiltration — all tiers allow DNS. An agent could encode data in DNS queries. Mitigation: DNS query logging via CoreDNS + OTel; rate limiting at CoreDNS if needed.
Abuse of allowed external endpoints — an Internet-Allowed sandbox could attack external services. Mitigation: rate limiting at the node level (iptables), abuse reporting, sandbox attribution via audit trail.

Alternatives Considered

A. Calico NetworkPolicy

Calico provides richer NetworkPolicy features (GlobalNetworkPolicy, DNS-based rules, application-layer policies). However, it replaces Flannel entirely, adding operational complexity. k3s’s built-in kube-router support is sufficient for our pod-selector and CIDR-based rules. If we need DNS-based allowlists (e.g., “allow *.github.com only”), Calico becomes worth revisiting.

B. Cilium

eBPF-native networking with the most powerful policy engine (L7 filtering, DNS-aware policies, transparent encryption). However, Cilium replaces both the CNI and kube-proxy, is significantly more complex to operate, and conflicts with our separate eBPF audit agent. If we outgrow kube-router’s capabilities, Cilium is the natural upgrade path.

C. OpenSandbox Egress Sidecar

Run a per-sandbox proxy sidecar that enforces egress rules at the application layer. Rejected: adds resource overhead per sandbox (memory, CPU), introduces a bypass risk (agent could reach the network interface directly), and duplicates what NetworkPolicy already provides at the CNI level. NetworkPolicy is the correct layer for network-level access control.

D. iptables Rules Inside the Kata VM

Apply egress rules inside the guest VM via iptables. Rejected: the agent runs as a process inside the VM and could potentially modify iptables rules (even with reduced privileges, there are escape paths). NetworkPolicy enforcement happens at the host CNI level, outside the VM — the agent cannot bypass it.