benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
OpenAI
GPT-5.3 Codex
6 benchmarks
GPQA Diamond
#7 of 49
91.5%
SWE-bench Verified
#7 of 40
80.0%
Terminal-Bench 2.0
#5 of 14
77.3%
OSWorld
#8 of 16
64.0%
SWE-bench Pro
#4 of 13
56.8%
Humanity's Last Exam
#13 of 24
39.9%