benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
BrowseComp leaderboard
BrowseComp
17 models tested · Updated 2026-03-05 · Verified sources only
GPT-5.4 Pro
leads at
89.3%
1
GPT-5.4 Pro
OpenAI ·
Blog/OpenAI
· 2026-03-05
Highest BrowseComp score reported. Pro variant with maximum reasoning.
89.3%
2
Claude Mythos Preview
Anthropic ·
Blog/Anthropic
· 2026-04-07
New SOTA. Uses 4.9x fewer tokens than Opus 4.6 while scoring higher.
86.9%
3
Gemini 3.1 Pro
Google ·
Blog/Google
· 2026-02-19
Up from 59.2% on Gemini 3 Pro. Strong autonomous web research capability.
85.9%
4
Claude Opus 4.6
Anthropic ·
Anthropic — Introducing Claude Opus 4.6
· 2026-02-05
Multi-agent web search coordination across hours-long sessions. 86.8% with harness.
84.0%
5
GPT-5.4
OpenAI ·
Blog/OpenAI
· 2026-03-05
Up from 65.8% on GPT-5.2. Native computer-use model with 1M context.
82.7%
6
Qwen 3.5 397B
Alibaba ·
HuggingFace/Qwen
· 2026-02-16
With aggressive context-folding strategy. Outperforms every US frontier model on web browsing.
78.6%
7
Kimi K2.5
Moonshot AI ·
Blog/Kimi
· 2026-01-27
With Agent Swarm. Base score 60.6, with context management 74.9. Competitive with GPT-5.4 (82.7).
78.4%
8
Kimi K2.5 Agent Swarm
Moonshot AI ·
Paper/Moonshot AI
· 2026-02-01
Multi-agent swarm configuration. 72.7% on WideSearch.
78.4%
9
MiniMax M2.5
MiniMax ·
Blog/MiniMax
· 2026-02-12
Strong browsing agent performance. Competitive with Claude Opus 4.6 (84.0%).
76.3%
10
GLM-5
Zhipu AI ·
Paper/Zhipu AI (arxiv:2602.15763)
· 2026-02-11
With context management strategy. Baseline without CM: 62.0%. Highest open-model BrowseComp score reported.
75.9%
11
Claude Sonnet 4.6
Anthropic ·
Blog/Anthropic
· 2026-02-17
Single-agent config. Corrected after cheating detection pipeline update (was 74.72%, adjusted to 74.01%). Multi-agent variant scores 82.07%.
74.0%
12
Step-3.5-Flash
StepFun ·
Blog/StepFun
· 2026-02-12
Score with Context Manager agent framework. Base score 51.6. Strong browsing for an open model.
69.0%
13
GLM-5.1
Zhipu AI ·
Blog/Z.AI
· 2026-04-07
Top open-model score on BrowseComp. 79.3 with context management variant.
68.0%
14
Qwen3.5 27B
Alibaba ·
HuggingFace/Qwen
· 2026-02-16
Small model (27B params) with competitive browsing ability. Below frontier but strong for open-weight.
61.0%
15
Kimi K2.5
Moonshot AI ·
Paper/Moonshot AI
· 2026-02-01
Single-agent mode. Agent Swarm achieves 78.4%.
60.6%
16
Sarvam 105B
Sarvam AI ·
HuggingFace/sarvamai
· 2026-03-06
India's first domestically-trained 105B model. MoE with 10.3B active params. Apache 2.0.
49.5%
17
GLM-4.7-Flash
Zhipu AI ·
HuggingFace/Zhipu
· 2026-01-15
Lightweight flash variant browser capability score.
42.8%