BrowseComp Leaderboard 2026 — Results Across 17 Real AI Models

BrowseComp leaderboard

BrowseComp

17 models tested · Updated 2026-03-05 · Verified sources only

      GPT-5.4 Pro leads at 89.3%
    

GPT-5.4 Pro

OpenAI · Blog/OpenAI · 2026-03-05

Highest BrowseComp score reported. Pro variant with maximum reasoning.

89.3%

Claude Mythos Preview

Anthropic · Blog/Anthropic · 2026-04-07

New SOTA. Uses 4.9x fewer tokens than Opus 4.6 while scoring higher.

86.9%

Gemini 3.1 Pro

Google · Blog/Google · 2026-02-19

Up from 59.2% on Gemini 3 Pro. Strong autonomous web research capability.

85.9%

Claude Opus 4.6

Anthropic · Anthropic — Introducing Claude Opus 4.6 · 2026-02-05

Multi-agent web search coordination across hours-long sessions. 86.8% with harness.

84.0%

GPT-5.4

OpenAI · Blog/OpenAI · 2026-03-05

Up from 65.8% on GPT-5.2. Native computer-use model with 1M context.

82.7%

Qwen 3.5 397B

Alibaba · HuggingFace/Qwen · 2026-02-16

With aggressive context-folding strategy. Outperforms every US frontier model on web browsing.

78.6%

Kimi K2.5

Moonshot AI · Blog/Kimi · 2026-01-27

With Agent Swarm. Base score 60.6, with context management 74.9. Competitive with GPT-5.4 (82.7).

78.4%

Kimi K2.5 Agent Swarm

Moonshot AI · Paper/Moonshot AI · 2026-02-01

Multi-agent swarm configuration. 72.7% on WideSearch.

78.4%

MiniMax M2.5

MiniMax · Blog/MiniMax · 2026-02-12

Strong browsing agent performance. Competitive with Claude Opus 4.6 (84.0%).

76.3%

GLM-5

Zhipu AI · Paper/Zhipu AI (arxiv:2602.15763) · 2026-02-11

With context management strategy. Baseline without CM: 62.0%. Highest open-model BrowseComp score reported.

75.9%

Claude Sonnet 4.6

Anthropic · Blog/Anthropic · 2026-02-17

Single-agent config. Corrected after cheating detection pipeline update (was 74.72%, adjusted to 74.01%). Multi-agent variant scores 82.07%.

74.0%

Step-3.5-Flash

StepFun · Blog/StepFun · 2026-02-12

Score with Context Manager agent framework. Base score 51.6. Strong browsing for an open model.

69.0%

GLM-5.1

Zhipu AI · Blog/Z.AI · 2026-04-07

Top open-model score on BrowseComp. 79.3 with context management variant.

68.0%

Qwen3.5 27B

Alibaba · HuggingFace/Qwen · 2026-02-16

Small model (27B params) with competitive browsing ability. Below frontier but strong for open-weight.

61.0%

Kimi K2.5

Moonshot AI · Paper/Moonshot AI · 2026-02-01

Single-agent mode. Agent Swarm achieves 78.4%.

60.6%

Sarvam 105B

Sarvam AI · HuggingFace/sarvamai · 2026-03-06

India's first domestically-trained 105B model. MoE with 10.3B active params. Apache 2.0.

49.5%

GLM-4.7-Flash

Zhipu AI · HuggingFace/Zhipu · 2026-01-15

Lightweight flash variant browser capability score.

42.8%