benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
head to head
Claude Sonnet 4.6
vs
GPT-5.3 Codex
4 shared benchmarks
1
wins
0
ties
3
wins
89.9%
GPQA Diamond
91.5%
72.5%
OSWorld
64.0%
79.6%
SWE-bench Verified
80.0%
59.1%
Terminal-Bench 2.0
77.3%