benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
head to head
Claude Opus 4.6
vs
GPT-5.3 Codex
5 shared benchmarks
4
wins
0
ties
1
wins
91.3%
GPQA Diamond
91.5%
53.1%
Humanity's Last Exam
39.9%
72.7%
OSWorld
64.0%
81.42%
SWE-bench Verified
80.0%
81.8%
Terminal-Bench 2.0
77.3%