Code Reviewer

A code review worker that uses MCP capabilities and an eval suite to score output quality.

Template

Save as code-reviewer.toml:

template_version = "1.0"

[worker]
name = "code-reviewer"

[workstation]
model = "claude-sonnet-4-20250514"
task = "Review the latest PR for code quality, security issues, and test coverage."

[workstation.capabilities]
mcps = ["github"]
skills = ["code-review"]

[compute]
mode = "sandbox"

[eval]
suite = "code-review-quality"

Capabilities and credential inference

This template declares two capabilities:

mcps = ["github"] — The GitHub MCP server, giving the worker access to PRs, issues, and code
skills = ["code-review"] — The code-review skill, providing review guidelines and patterns

When arpi processes the template, it resolves credentials automatically:

The registry looks up github MCP and determines it needs a GitHub token
The token is provisioned from Infisical
The credential is injected into the sandbox at spawn time

You never specify credentials in templates — just capabilities.

Eval suite

The [eval] section attaches the code-review-quality eval suite. After the worker completes its task, arpi runs the suite against the output using three grader types:

Grader	How it works	Example
Code	Deterministic checks — regex, assertions, test execution	”Review mentions at least 3 files”
Model	LLM-as-judge scoring against a rubric	”Rate security analysis depth 1-5”
Human	Manual review queue for subjective assessment	”Was the review actionable?”

Usage

arpi spawn code-reviewer.toml

After the worker completes:

arpi eval list --worker code-reviewer

This shows eval scores broken down by grader type, with an aggregate quality score.

Combining capabilities

Templates can declare multiple MCPs and skills. Each capability triggers its own credential resolution:

[workstation.capabilities]
mcps = ["github", "slack", "linear"]
skills = ["code-review", "testing"]

All three MCP servers would have their credentials provisioned and injected into the sandbox independently.