Online-Mind2Web Leaderboard 2026 — Results Across 5 Real AI Models

Online-Mind2Web leaderboard

Online-Mind2Web

5 models tested · Updated 2026-03-05 · Verified sources only

      GPT-5.4 leads at 92.8%
    

OpenAI · Blog/OpenAI · 2026-03-05

Screenshot-based observations only. Far ahead of ChatGPT Atlas Agent Mode at 70.9%.

92.8%

Google DeepMind · arxiv/2604.08516 · 2026-04-09

Google computer-use-preview on live web tasks. Second only to OpenAI computer-use on this benchmark.

57.3%

Alibaba · arxiv/2602.16855 · 2026-02-15

Online web task completion. Best among open-source models.

48.6%

Ai2 · arxiv/2604.08516 · 2026-04-09

Open-weight 8B visual web agent. Matches Mind2Web offline score on live web tasks.

35.3%

Ai2 · arxiv/2604.08516 · 2026-04-09

4B open-weight visual web agent from Ai2. Strong for its size on online tasks.

31.3%