benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
power rankings
Who's winning
Composite score: average rank percentile across benchmarks + breadth bonus. Models with 3+ benchmarks. Higher = consistently ranks near the top across more benchmarks.
1
Claude Mythos Preview
Anthropic · 13 benchmarks
92.2
2
GPT-5.4 Pro
OpenAI · 5 benchmarks
80.2
3
Qwen 3.6 Plus
Alibaba · 7 benchmarks
78.0
4
GPT-5.2
OpenAI · 5 benchmarks
77.9
5
Gemini 3.1 Pro
Google · 13 benchmarks
77.6
6
Gemini 3 Flash
Google · 3 benchmarks
76.9
7
GPT-5
OpenAI · 4 benchmarks
74.3
8
GPT-5.3 Codex
OpenAI · 6 benchmarks
71.2
9
Surfer 2
H Company · 4 benchmarks
69.8
10
Qwen 3.5 397B
Alibaba · 7 benchmarks
69.5
11
Seed 2.0 Pro
ByteDance · 6 benchmarks
69.4
12
DeepSeek V3.2 Speciale
DeepSeek · 3 benchmarks
66.3
13
GLM-5.1
Zhipu AI · 7 benchmarks
65.3
14
GPT-5.4
OpenAI · 16 benchmarks
63.7
15
Claude Opus 4.6
Anthropic · 18 benchmarks
60.8
16
Claude Sonnet 4.6
Anthropic · 7 benchmarks
60.2
17
GLM-4.7
Zhipu AI · 5 benchmarks
58.2
18
GPT-5.4 Mini
OpenAI · 4 benchmarks
58.1
19
Kimi K2.5
Moonshot AI · 14 benchmarks
57.8
20
Step-3.5-Flash
StepFun · 7 benchmarks
57.3
21
Arcee Trinity
Arcee AI · 6 benchmarks
56.3
22
GLM-5
Zhipu AI · 9 benchmarks
54.2
23
gpt-oss-120b
OpenAI · 4 benchmarks
52.2
24
Seed 2.0 Lite
ByteDance · 6 benchmarks
52.0
25
Gemini 3.1 Flash-Lite
Google · 3 benchmarks
48.9
26
Muse Spark
Meta · 9 benchmarks
48.8
27
DeepSeek V3.2
DeepSeek · 5 benchmarks
48.6
28
Grok 4
xAI · 6 benchmarks
48.4
29
o4-mini
OpenAI · 4 benchmarks
47.3
30
Qwen3.5 27B
Alibaba · 11 benchmarks
47.1