AndroidWorld Leaderboard 2026 — Results Across 8 Real AI Models

AndroidWorld leaderboard

AndroidWorld

8 models tested · Updated 2025-11-25 · Verified sources only

      Surfer 2 leads at 87.1%
    

Surfer 2

H Company · Blog/H Company · 2025-11-25

SOTA on mobile agent tasks, surpassing human baseline of 80%.

87.1%

GLM-5V-Turbo

Zhipu AI · X/@VaibhavSisinty + Z.ai launch · 2026-04-01

First vision-language model to lead AndroidWorld. Beats Claude Opus 4.6 (62.0) by 13.7 points on Android GUI tasks.

75.7%

GUI-Owl-1.5-8B-Thinking

Alibaba · arxiv/2602.16855 · 2026-02-15

Mobile GUI agent. On par with UI-TARS-2 (73.3) at only 8B params.

71.6%

Seed1.8

arxiv · arxiv/2603.20633 · 2026-03-21

Seed1.8 achieves 61.9% on OSWorld and 85.9% on OnlineMind2Web, surpassing Claude Sonnet 4.5 and GPT-O3-CUA on computer use tasks. Also achieves 70.7% on AndroidWorld.

70.7%

Qwen 3.6 27B

Alibaba · HuggingFace/Qwen · 2026-04-22

New SOTA for open models on AndroidWorld. Best sub-30B agent model.

70.3%

GUI-Owl-1.5-32B-Instruct

Alibaba · arxiv/2602.16855 · 2026-02-15

Strong mobile performance. Multi-platform fundamental GUI agent.

69.8%

Qwen3.5 27B

Alibaba · HuggingFace/Qwen · 2026-02-16

Mobile agent benchmark. Competitive with specialized agent frameworks.

64.2%

VeriGUI-7B

arxiv · arxiv/2604.05477 · 2026-04-07

VeriGUI-7B achieves best open-source results on AndroidControl High (74.2% TM, 65.5% GR, 51.1% SR) and competitive with GPT-5.1 and Gemini-3-flash. Introduces TVAE framework (Thinking-Verification-Act

25.1%