Skip to content

Code Reviewer

A code review worker that uses MCP capabilities and an eval suite to score output quality.

Save as code-reviewer.toml:

template_version = "1.0"
[worker]
name = "code-reviewer"
[workstation]
model = "claude-sonnet-4-20250514"
task = "Review the latest PR for code quality, security issues, and test coverage."
[workstation.capabilities]
mcps = ["github"]
skills = ["code-review"]
[compute]
mode = "sandbox"
[eval]
suite = "code-review-quality"

This template declares two capabilities:

  • mcps = ["github"] — The GitHub MCP server, giving the worker access to PRs, issues, and code
  • skills = ["code-review"] — The code-review skill, providing review guidelines and patterns

When arpi processes the template, it resolves credentials automatically:

  1. The registry looks up github MCP and determines it needs a GitHub token
  2. The token is provisioned from Infisical
  3. The credential is injected into the sandbox at spawn time

You never specify credentials in templates — just capabilities.

The [eval] section attaches the code-review-quality eval suite. After the worker completes its task, arpi runs the suite against the output using three grader types:

GraderHow it worksExample
CodeDeterministic checks — regex, assertions, test execution”Review mentions at least 3 files”
ModelLLM-as-judge scoring against a rubric”Rate security analysis depth 1-5”
HumanManual review queue for subjective assessment”Was the review actionable?”
Terminal window
arpi spawn code-reviewer.toml

After the worker completes:

Terminal window
arpi eval list --worker code-reviewer

This shows eval scores broken down by grader type, with an aggregate quality score.

Templates can declare multiple MCPs and skills. Each capability triggers its own credential resolution:

[workstation.capabilities]
mcps = ["github", "slack", "linear"]
skills = ["code-review", "testing"]

All three MCP servers would have their credentials provisioned and injected into the sandbox independently.