Level 3 · Session 4
⏱ 90 min⚡ 4 builds
Evaluate Your Agents
Build a file-native eval system, including a rubric and skills, then demo what you built to the cohort.
By the end of this session
✓A rubric at evals/rubrics/researcher.md calibrated against good and bad examples
✓A /run-test-cases skill that runs prompts and writes input/output/trace files
✓An /eval-output skill that scores against the rubric and routes failures to a Review Queue
✓A skillified fix written into your subagent's instructions
✓A weekly eval routine scheduled in Claude Code
✓Level 3 certificate earned and shared
Loading session…