benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
VisualWebArena leaderboard
VisualWebArena
5 models tested · Updated 2026-04-09 · Verified sources only
Gemini 3 Pro
leads at
49.0%
1
Gemini 3 Pro
Google DeepMind ·
arxiv/2604.07776
· 2026-04-09
Top score on VisualWebArena. Tested as web agent.
49.0%
2
Gemini 3.1 Pro
Google DeepMind ·
arxiv/2604.07776
· 2026-04-09
Tested as web agent. Second-best after Gemini 3 Pro on VWA.
47.9%
3
Qwen3.5 27B
Qwen ·
arxiv/2604.07776
· 2026-04-09
Tested as web agent in structured distillation paper.
37.4%
4
A3-Qwen3.5-9B
Ai2 ·
arxiv/2604.07776
· 2026-04-09
9B open-weight model trained via structured distillation from Gemini 3 Pro. First strong open-weight VWA result.
33.9%
5
A3-Qwen3.5-4B
Ai2 ·
arxiv/2604.07776
· 2026-04-09
4B model trained via structured distillation. Strong generalization to visual web tasks.
30.1%