Level 3 · Session 4

⏱ 90 min⚡ 4 builds

Evaluate Your Agents

Build a file-native eval system, including a rubric and skills, then demo what you built to the cohort.

By the end of this session

✓A rubric at evals/rubrics/researcher.md calibrated against good and bad examples

✓A /run-test-cases skill that runs prompts and writes input/output/trace files

✓An /eval-output skill that scores against the rubric and routes failures to a Review Queue

✓A skillified fix written into your subagent's instructions

✓A weekly eval routine scheduled in Claude Code

✓Level 3 certificate earned and shared

Loading session…