Parity: Auto-evals for harness changes

Parity helps agent teams verify that prompt and harness changes actually changed behavior. It monitors PRs for behavior-defining changes, identifies what changed, checks existing eval coverage, and generates targeted probe evals to test whether the new behavior shows up and where it stops holding. Built for teams who want something faster and more reliable than manual spot checks and vibe testing.

Parity: Auto-evals for harness changes

ストックにはログインが必要です