Pair Rubric
Two model answers, one shared yes/no checklist. Atomic per-cell checks, virtualized to a thousand rubrics. The cleanest signal for SFT.
One workspace for the complete flow: task publishing, visual form design, labeler workbench, AI review, human acceptance, and reproducible exports.
Create schema-driven tasks, import datasets, assign rows, and publish from one cockpit.
Claim items, autosave drafts, use field-level AI help, and see revision feedback.
Review queues include AI scores, diff history, batch actions, and audit trails.
Two model answers, one shared yes/no checklist. Atomic per-cell checks, virtualized to a thousand rubrics. The cleanest signal for SFT.
LMSYS-style head-to-head with multi-dimension 1–5 scoring. GSB verdict per dimension, plus required reasoning. Drives Bradley-Terry / Elo.
Score every step of an agent trajectory — tool calls, results, final answer. Per-step rubric + per-trajectory verdict. The flagship for agent eval.
Switch modes per task. Mix them per workspace. The rubric, the trust score, and the audit trail travel with you.