MMMU Leaderboard 2026 — Results Across 27 Real AI Models

MMMU leaderboard

MMMU

27 models tested · Updated 2026-04-09 · Verified sources only

      Qwen 3 VL 30B A3B + Soft Routing leads at 89.42%
    

Qwen 3 VL 30B A3B + Soft Routing

arxiv · arxiv/2604.08541 · 2026-04-09

Identifies Seeing but Not Thinking phenomenon in multimodal MoE models: vision tokens routed to reasoning experts cause distraction. Soft routing intervention improves Qwen3-VL-30B-A3B average from 59

89.42%

Qwen 3.6 Plus

Alibaba · Blog/Alibaba · 2026-03-31

Second only to Gemini 3 Pro (87.2) on multimodal understanding.

86.0%

Qwen 3.5 397B

Alibaba · HuggingFace/Qwen · 2026-02-16

Qwen3.5 flagship MoE model (397B total, 17B active). Unified vision-language foundation model.

85.0%

GPT-5

OpenAI · Blog/OpenAI · 2025-08-07

College-level visual reasoning. Led prior OpenAI models at launch.

84.2%

Qwen 3.6 27B

Alibaba · HuggingFace/Qwen · 2026-04-22

Beats Qwen 3.5 27B and Gemma 4 31B on MMMU. Best 27B on multimodal reasoning.

82.9%

Gemini 2.5 Pro

arxiv · arxiv/2604.08539 · 2026-04-09

8B open-weight multimodal model trained with GRPO+GDPO. Competitive with Gemini 2.5 Pro on DocVQA and chart understanding. New SOTA for open-weight VLMs on MMMU.

81.7%

Qwen 3.6 35B-A3B

Alibaba · HuggingFace/Qwen · 2026-04-16

Slight improvement over Qwen 3.5 35B-A3B (81.4). Beats Claude Sonnet 4.5 (79.6).

81.7%

STEP3-VL 10B

StepFun · arxiv/2601.09668 · 2026-01-14

10B open-source model rivals models 10-20x larger on MMMU. Trained with 1.4k RL iterations and PaCoRe.

80.11%

Llama4-Scout-109B-A17B + Soft Routing

arxiv · arxiv/2604.08541 · 2026-04-09

79.2%

EXAONE 4.5 33B

LG AI Research · HuggingFace/LGAI-EXAONE · 2026-04-14

Open-weight VLM competitive with GPT-5 mini (79.0) on multimodal understanding.

78.7%

Qwen 3.5 9B

Alibaba · HF/Qwen · 2026-04-07

Strong vision-language at 9B. Beats GPT-5-Nano (75.8) and Gemini-2.5-Flash-Lite (73.4).

78.4%

Llama 4 Maverick

Meta · HuggingFace/Meta · 2026-04-05

Instruction-tuned multimodal score.

73.4%

OpenVLThinkerV2 8B

Research · arxiv/2604.08539 · 2026-04-09

8B open-source model trained with G2RPO surpasses GPT-4o on MMMU and 5 other VLM benchmarks.

71.6%

OpenVLThinkerV2

arxiv · arxiv/2604.08539 · 2026-04-09

Introduces Gaussian GRPO (G2RPO), replacing standard linear scaling in GRPO with non-linear distributional matching. OpenVLThinkerV2-7B achieves new SOTA for open-source 7B models on MMMU (71.6%), Mat

71.6%

GPT-4o

arxiv · arxiv/2604.08539 · 2026-04-09

8B open-weight multimodal model trained with GRPO+GDPO. Competitive with Gemini 2.5 Pro on DocVQA and chart understanding. New SOTA for open-weight VLMs on MMMU.

70.7%

Qwen 3 VL 32B Instruct

arxiv · arxiv/2603.03975 · 2026-03-04

Compact 15B open-weight multimodal reasoning model from Microsoft. Achieves competitive VLM performance with much less compute via careful data curation and dynamic-resolution encoders.

70.6%

Qwen 3 VL GRPO

arxiv · arxiv/2604.08539 · 2026-04-09

8B open-weight multimodal model trained with GRPO+GDPO. Competitive with Gemini 2.5 Pro on DocVQA and chart understanding. New SOTA for open-weight VLMs on MMMU.

69.7%

RLSD (Qwen3-VL-8B)

arxiv · arxiv/2604.03128 · 2026-04-06

Proposes RLSD combining self-distillation magnitude with RLVR direction. On Qwen3-VL-8B, achieves best avg accuracy across 5 multimodal reasoning benchmarks, outperforming GRPO by 2.32% on average.

67.22%

Insight-V++ Self-Evolving

arxiv · arxiv/2603.13398 · 2026-03-11

4B end-to-end OCR model that ranks #1 on OmniDocBench among end-to-end models. Introduces "Layout-as-Thought" for structured layout representations. Outperforms Qwen3-VL-4B on ChartQA (+4.8) and Chart

64.8%

Qwen 3 VL 8B Instruct

arxiv · arxiv/2603.03975 · 2026-03-04

Compact 15B open-weight multimodal reasoning model from Microsoft. Achieves competitive VLM performance with much less compute via careful data curation and dynamic-resolution encoders.

60.7%

Qwen 3 VL 8B Instruct

arxiv · arxiv/2604.08539 · 2026-04-09

8B open-weight multimodal model trained with GRPO+GDPO. Competitive with Gemini 2.5 Pro on DocVQA and chart understanding. New SOTA for open-weight VLMs on MMMU.

60.2%

InternVL-U

arxiv · arxiv/2603.13398 · 2026-03-11

54.7%

Kimi-VL-16B-A3B + Soft Routing

arxiv · arxiv/2604.08541 · 2026-04-09

54.54%

Phi-4-reasoning-vision-15B

arxiv · arxiv/2603.03975 · 2026-03-04

Compact 15B open-weight multimodal reasoning model from Microsoft. Achieves competitive VLM performance with much less compute via careful data curation and dynamic-resolution encoders.

54.3%

Kimi-VL-A3B-Instruct

arxiv · arxiv/2603.03975 · 2026-03-04

Compact 15B open-weight multimodal reasoning model from Microsoft. Achieves competitive VLM performance with much less compute via careful data curation and dynamic-resolution encoders.

52.0%

Gemma 3 12B IT

arxiv · arxiv/2603.03975 · 2026-03-04

Compact 15B open-weight multimodal reasoning model from Microsoft. Achieves competitive VLM performance with much less compute via careful data curation and dynamic-resolution encoders.

50.0%

Firebolt-VL 0.8B

arxiv · arxiv/2604.04579 · 2026-04-06

0.8B parameter VLM using Mamba-based SSM with cross-modality modulation instead of cross-attention. Outperforms MobileVLM V2 (1.7B) and MoE-LLaVA (2.2B) on several benchmarks despite being much smalle

26.4%