benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
head to head
Claude Mythos Preview
vs
GPT-5.3 Codex
6 shared benchmarks
6
wins
0
ties
0
wins
94.6%
GPQA Diamond
91.5%
56.8%
Humanity's Last Exam
39.9%
79.6%
OSWorld
64.0%
77.8%
SWE-bench Pro
56.8%
93.9%
SWE-bench Verified
80.0%
82.0%
Terminal-Bench 2.0
77.3%