WebArena Leaderboard 2026 — Results Across 18 Real AI Models

WebNavigator reframes web navigation as deterministic retrieval over Interaction Graphs, achieving 72.9% on WebArena multi-site tasks and 39.7% on OnlineMind2Web, more than doubling enterprise-level a

57.1%

Gemini 3.1 Pro

Google DeepMind · arxiv/2604.07776 · 2026-04-09

From arxiv paper on structured distillation. Tested with Gemini 3.1 Pro API as web agent.

53.8%

Gemini 3 Pro

Google DeepMind · arxiv/2604.07776 · 2026-04-09

From arxiv paper on structured distillation. Gemini 3 Pro as web agent.

51.2%

WebNavigator + GPT-4o

arxiv · arxiv/2603.20366 · 2026-03-20

49.9%

GUI-Owl-1.5-32B-Thinking

Alibaba · arxiv/2602.16855 · 2026-02-15

Best open-weight WebArena result for 32B class. Thinking variant for web navigation.

48.4%

WebNavigator + Qwen3-VL-32B

arxiv · arxiv/2603.20366 · 2026-03-20

47.8%

GUI-Owl-1.5-8B-Thinking

Alibaba · arxiv/2602.16855 · 2026-02-15

Best open-source result on WebArena. Beats proprietary models by wide margin for its size.

46.7%

A3-Qwen3.5-9B

McGill-NLP · arxiv/2604.07776 · 2026-04-09

Distilled from Gemini 3 Pro using structured trajectory synthesis. Matches 3x larger Qwen3.5-27B and nearly doubles previous best open-weight SFT result (21.7%).

41.5%

Qwen3.5 27B

Alibaba · arxiv/2604.07776 · 2026-04-09

Teacher model for A3 distillation. Matched by 9B student on WebArena.

41.5%

Qwen 3.5 27B

arxiv · arxiv/2604.07776 · 2026-04-09

Agent-as-Annotators framework distills web agent capabilities from Gemini 3 Pro into a 9B-param student. A3-Qwen3.5-9B matches Qwen3.5-27B on WebArena (41.5% SR), beating GPT-4o (31.5%).

41.5%

Claude 3.5 Sonnet

arxiv · arxiv/2604.07776 · 2026-04-09

Agent-as-Annotators framework distills web agent capabilities from Gemini 3 Pro into a 9B-param student. A3-Qwen3.5-9B matches Qwen3.5-27B on WebArena (41.5% SR), beating GPT-4o (31.5%).

36.0%

A3-Qwen3.5-4B

A3 Team · arxiv/2604.07776 · 2026-04-09

4B distilled model beats GPT-4o (31.5%). +11.1pp over base Qwen3.5-4B via structured distillation.

35.2%

GPT-4o

arxiv · arxiv/2604.07776 · 2026-04-09

Agent-as-Annotators framework distills web agent capabilities from Gemini 3 Pro into a 9B-param student. A3-Qwen3.5-9B matches Qwen3.5-27B on WebArena (41.5% SR), beating GPT-4o (31.5%).

31.5%

Qwen3.5-9B (base)

arxiv · arxiv/2604.07776 · 2026-04-09

Agent-as-Annotators framework distills web agent capabilities from Gemini 3 Pro into a 9B-param student. A3-Qwen3.5-9B matches Qwen3.5-27B on WebArena (41.5% SR), beating GPT-4o (31.5%).

31.0%