Mind2Web
2 models tested · Updated 2026-03-05 · Verified sources only
GPT-5.4 leads at 92.8%
1
OpenAI · Blog/OpenAI · 2026-03-05
Online-Mind2Web using screenshot observations alone. Far ahead of ChatGPT Atlas Agent Mode (70.9%). Tests browser navigation on real-world webpages.
92.8%
2
Allen AI · Blog/Allen AI · 2026-03-24
Online-Mind2Web. Single rollout. With test-time scaling (pass@4): 60.5%. 8B params, Apache 2.0.
35.3%