V11 Leaderboard Published Across 80 Complete CLI Tasks
The homepage leaderboard now reflects V11 pass@1/pass@3 results: GPT-5.5 leads at 61.7% pass@1, followed by GPT-5.3-codex and Opus 4.6.
V11 ranks CLI agents by pass@1 across the 80 tasks where all seven evaluated models have complete three-run coverage.