benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
HumanEval leaderboard
HumanEval
2 models tested · Updated 2026-01-27 · Verified sources only
Kimi K2.5
leads at
99.0%
1
Kimi K2.5
Moonshot AI ·
NVIDIA NIM / Moonshot AI
· 2026-01-27
Reasoning variant. Effectively perfect score. Top of HumanEval leaderboard as of March 2026.
99.0%
2
JoyAI-LLM Flash
JD.com ·
Paper/JD.com (arXiv)
· 2026-04-03
48B MoE, only 2.7B active params. Remarkable efficiency — competitive scores at fraction of compute.
87.5%