benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
OpenAI
o4-mini
4 benchmarks
AIME
#3 of 39
99.5%
AIME
#3 of 39
92.7%
GPQA Diamond
#36 of 49
81.4%
SWE-bench Verified
#33 of 40
68.1%