benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
arxiv
Qwen3 Max
3 benchmarks
HumanEval
#8 of 17
85.0%
AIME
#61 of 103
58.3%
Humanity's Last Exam
#34 of 36
14.0%