GPT-5.3 Codex vs GPT-5.4
6 shared benchmarks
2
wins
0
ties
4
wins
91.5%
GPQA Diamond
92.8%
39.9%
Humanity's Last Exam
36.24%
64.0%
OSWorld
75.0%
56.8%
SWE-bench Pro
57.7%
80.0%
SWE-bench Verified
77.2%
77.3%
Terminal-Bench 2.0
81.8%