Code Reviewer
A code review worker that uses MCP capabilities and an eval suite to score output quality.
Template
Section titled “Template”Save as code-reviewer.toml:
template_version = "1.0"
[worker]name = "code-reviewer"
[workstation]model = "claude-sonnet-4-20250514"task = "Review the latest PR for code quality, security issues, and test coverage."
[workstation.capabilities]mcps = ["github"]skills = ["code-review"]
[compute]mode = "sandbox"
[eval]suite = "code-review-quality"Capabilities and credential inference
Section titled “Capabilities and credential inference”This template declares two capabilities:
mcps = ["github"]— The GitHub MCP server, giving the worker access to PRs, issues, and codeskills = ["code-review"]— The code-review skill, providing review guidelines and patterns
When arpi processes the template, it resolves credentials automatically:
- The registry looks up
githubMCP and determines it needs a GitHub token - The token is provisioned from Infisical
- The credential is injected into the sandbox at spawn time
You never specify credentials in templates — just capabilities.
Eval suite
Section titled “Eval suite”The [eval] section attaches the code-review-quality eval suite. After the worker completes its task, arpi runs the suite against the output using three grader types:
| Grader | How it works | Example |
|---|---|---|
| Code | Deterministic checks — regex, assertions, test execution | ”Review mentions at least 3 files” |
| Model | LLM-as-judge scoring against a rubric | ”Rate security analysis depth 1-5” |
| Human | Manual review queue for subjective assessment | ”Was the review actionable?” |
arpi spawn code-reviewer.tomlAfter the worker completes:
arpi eval list --worker code-reviewerThis shows eval scores broken down by grader type, with an aggregate quality score.
Combining capabilities
Section titled “Combining capabilities”Templates can declare multiple MCPs and skills. Each capability triggers its own credential resolution:
[workstation.capabilities]mcps = ["github", "slack", "linear"]skills = ["code-review", "testing"]All three MCP servers would have their credentials provisioned and injected into the sandbox independently.