benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
OpenAI
o3
2 benchmarks
GPQA Diamond
#17 of 49
87.7%
SWE-bench Verified
#30 of 40
71.7%