InfoVQA
4 models tested · Updated 2026-04-09 · Verified sources only
OpenVLThinkerV2 8B leads at 86.4%
1
Shanghai AI Lab · arxiv/2604.08539 · 2026-04-09
Gaussian GRPO-trained multimodal reasoner achieving strong InfoVQA. Uses distributional matching for inter-task gradient equity across diverse visual tasks.
86.4%
2
Q-Mask Team · arxiv/2604.00161 · 2026-03-31
+1.6 over Qwen2.5-VL-3B baseline. Spatial priors help on info-heavy documents.
78.7%
3
Shanghai Jiao Tong University · arxiv/2603.07494 · 2026-03-08
Layout-aware reasoning with Visual-Semantic Chain. Achieves 78.6 on InfoVQA, outperforming Qwen3-VL-8B by 2.9 points.
78.6%
4
Q-Mask Team · arxiv/2604.00161 · 2026-03-31
+2.0 over Qwen3-VL-2B baseline. Small model benefits most from spatial priors.
74.4%