HumanEval
2 models tested · Updated 2026-01-27 · Verified sources only
Kimi K2.5 leads at 99.0%
1
Moonshot AI · NVIDIA NIM / Moonshot AI · 2026-01-27
Reasoning variant. Effectively perfect score. Top of HumanEval leaderboard as of March 2026.
99.0%
2
JD.com · Paper/JD.com (arXiv) · 2026-04-03
48B MoE, only 2.7B active params. Remarkable efficiency — competitive scores at fraction of compute.
87.5%