AndroidWorld
3 models tested · Updated 2025-11-25 · Verified sources only
Surfer 2 leads at 87.1%
1
H Company · Blog/H Company · 2025-11-25
SOTA on mobile agent tasks, surpassing human baseline of 80%.
87.1%
2
Zhipu AI · X/@VaibhavSisinty + Z.ai launch · 2026-04-01
First vision-language model to lead AndroidWorld. Beats Claude Opus 4.6 (62.0) by 13.7 points on Android GUI tasks.
75.7%
3
Alibaba · HuggingFace/Qwen · 2026-02-16
Mobile agent benchmark. Competitive with specialized agent frameworks.
64.2%