MATH-Vision
9 models tested · Updated 2026-04-02 · Verified sources only
Gemma 4 31B leads at 85.6%
1
Google · HuggingFace/Google DeepMind · 2026-04-02
Multimodal math reasoning from images.
85.6%
2
Google · HuggingFace/Google DeepMind · 2026-04-02
MoE 25.2B total, 3.8B active.
82.4%
3
StepFun · arxiv/2601.09668 · 2026-01-14
10B model achieves 75.95% on MATH-Vision, competitive with Gemma 4 31B (85.6) at 1/3 the params.
75.95%
4
Vero Team · arxiv/2604.04917 · 2026-04-06
Highest MATH-Vision among all Vero variants. Thinking backbone excels on visual math.
63.5%
5
Google · HuggingFace/Google DeepMind · 2026-04-02
4.5B effective params.
59.5%
6
Vero Team · arxiv/2604.04917 · 2026-04-06
+2.6 over MiMoVL base. Surpasses MiMoVL-7B-RL which uses proprietary recipe.
59.4%
7
Vero Team · arxiv/2604.04917 · 2026-04-06
+5.1 over base. Open RL matches proprietary VLM training pipelines on visual math.
59.0%
8
Research · arxiv/2604.08539 · 2026-04-09
8B model trained with G2RPO reaches new open-source SOTA on MATH-Vision, beating larger models.
53.4%
9
Google · HuggingFace/Google DeepMind · 2026-04-02
2.3B effective params.
52.4%