benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
Moonshot AI
Kimi K2.5
14 benchmarks
HumanEval
#1 of 2
99.0%
AIME
#11 of 39
96.1%
GPQA Diamond
#18 of 49
87.6%
MMLU Pro
#6 of 29
87.1%
LiveCodeBench
#8 of 28
85.0%
MMMU Pro
#6 of 15
78.5%
BrowseComp
#7 of 17
78.4%
SWE-bench Verified
#16 of 40
76.8%
OSWorld
#9 of 16
63.3%
BrowseComp
#7 of 17
60.6%
WebArena
#3 of 3
58.9%
Terminal-Bench 2.0
#13 of 14
50.8%
SWE-bench Pro
#10 of 13
50.7%
Humanity's Last Exam
#19 of 24
24.37%