benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
Qwen 3 Max
3 benchmarks
HumanEval
#9 of 18
85.0%
AIME
#63 of 105
58.3%
Humanity's Last Exam
#38 of 39
14.0%