benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
WebVoyager leaderboard
WebVoyager
4 models tested · Updated 2026-03-26 · Verified sources only
Alumnium MCP + Claude Code
leads at
98.5%
1
Alumnium MCP + Claude Code
Alumnium ·
Blog/Alumnium
· 2026-03-26
New WebVoyager SOTA. MCP server + Claude Code general-purpose agent. ~$5 total cost for 610 tasks. Reproducible code available.
98.5%
2
Surfer 2
H Company ·
Blog/H Company
· 2025-10-17
Cross-platform computer-use agent. SOTA on WebVoyager at time of release. Separates planning from execution with orchestrator + sub-agents.
97.1%
3
GLM-5V-Turbo
Zhipu AI ·
X/@VaibhavSisinty + Z.ai launch
· 2026-04-01
Vision-language model with native multimodal input. Marginal lead over Claude Opus 4.6 (88.0) on browser navigation tasks.
88.5%
4
MolmoWeb 8B
Allen AI ·
Blog/Allen AI
· 2026-03-24
Open-weight 8B browser agent. With test-time scaling (pass@4): 94.7%. Outperforms Fara-7B and GPT-4o-based agents.
78.2%