Colibri AI Academy
Syllabus/Level 3 · Session 4
2 / 8 actions
Level 3 · Session 4
90 min4 builds

Evaluate Your Agents

Build a file-native eval system, including a rubric and skills, then demo what you built to the cohort.

By the end of this session
A rubric at evals/rubrics/researcher.md calibrated against good and bad examples
A /run-test-cases skill that runs prompts and writes input/output/trace files
An /eval-output skill that scores against the rubric and routes failures to a Review Queue
A skillified fix written into your subagent's instructions
A weekly eval routine scheduled in Claude Code
Level 3 certificate earned and shared

Loading session…