MMMU Pro Leaderboard 2026 — Results Across 15 Real AI Models

MMMU Pro leaderboard

MMMU Pro

15 models tested · Updated 2026-03-05 · Verified sources only

      GPT-5.4 leads at 81.2%
    

GPT-5.4

OpenAI · Blog/OpenAI · 2026-03-05

Visual understanding and reasoning. Without tool use, reasoning effort xhigh.

81.2%

Gemini 3 Flash

Google · Blog/Google · 2025-12-17

Multimodal reasoning at Flash-tier cost. Competitive with Gemini 3 Pro.

81.0%

Gemini 3.1 Pro

Google · Official/Google DeepMind · 2026-02-19

Multimodal understanding without tools. Strong visual reasoning.

80.5%

Muse Spark

Meta · Blog/Meta AI · 2026-04-08

Second-best MMMU Pro score, behind Gemini 3.1 Pro Preview (82.4%). Strong multimodal showing for Meta Superintelligence Labs debut model.

80.5%

Qwen 3.5 397B

Alibaba · HuggingFace/Qwen · 2026-02-16

Native vision-language model. Strong multimodal reasoning.

79.0%

Kimi K2.5

Moonshot AI · Blog/Kimi · 2026-01-27

Strong multimodal reasoning for an open-weight model.

78.5%

Gemma 4 31B

Google · Model Card/Google · 2026-04-02

Multimodal vision benchmark. Up from 49.7% on Gemma 3 27B.

76.9%

Gemini 3.1 Flash-Lite

Google · Model Card/Google · 2026-03-03

Budget-tier ($0.25/1M input) yet competitive on multimodal benchmarks.

76.8%

Qwen3.5 27B

Alibaba · HuggingFace/Qwen · 2026-02-16

Multimodal understanding. Strong for a small model — competitive with some frontier scores from late 2025.

75.0%

Gemma 4 26B A4B

Google · Model Card/Google · 2026-04-02

MoE multimodal. 31B dense reaches 76.9%.

73.8%

Qwen 3.5 9B

Alibaba · HuggingFace/Qwen · 2026-03-02

9B params. Outperforms Gemini 2.5 Flash-Lite (59.7) on visual reasoning. Strong for its size class.

70.1%

Llama 4 Maverick

Meta · HuggingFace/Meta · 2026-04-05

Official model card score. 17B active params, 128 experts.

59.6%

Gemma 4 4B

Google · Google Model Card · 2026-04-02

Multimodal understanding from a 4B model. Strong for edge deployment.

52.6%

Llama 4 Scout

Meta · HuggingFace/Meta · 2026-04-05

Official model card score. 17B active params, 16 experts, 10M context.

52.2%

Gemma 4 E2B

Google · Model Card/Google · 2026-04-02

Multimodal reasoning at 2.3B active params. Natively multimodal from pretraining.

44.2%