benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
MMMLU leaderboard
MMMLU
5 models tested · Updated 2026-04-07 · Verified sources only
Claude Opus 4.6
leads at
91.1%
1
Claude Opus 4.6
Anthropic ·
arxiv/Mythos-System-Card
· 2026-04-07
Multilingual MMLU. Slightly below Gemini 3.1 Pro (92.6-93.6%).
91.1%
2
Gemma 4 31B
Google ·
HuggingFace/Google DeepMind
· 2026-04-02
Massive multilingual MMLU. 30.7B dense.
88.4%
3
Gemma 4 26B A4B
Google ·
HuggingFace/Google DeepMind
· 2026-04-02
MoE 25.2B total, 3.8B active.
86.3%
4
Gemma 4 E4B
Google ·
HuggingFace/Google DeepMind
· 2026-04-02
4.5B effective params.
76.6%
5
Gemma 4 E2B
Google ·
HuggingFace/Google DeepMind
· 2026-04-02
2.3B effective params.
67.4%