benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
ARC-AGI 2 leaderboard
ARC-AGI 2
6 models tested · Updated 2026-02-05 · Verified sources only
Claude Opus 4.6
leads at
85.0%
1
Claude Opus 4.6
Anthropic ·
Blog/Anthropic
· 2026-02-05
Max effort config with 120k thinking budget. Standard config scored 68.8%.
85.0%
2
Gemini 3 Deep Think
Google ·
Blog/Google
· 2026-02-12
ARC Prize Foundation verified. Specialized reasoning mode — highest ARC-AGI-2 score by a wide margin (prev best ~54%). Also scored Codeforces Elo 3455.
84.6%
3
GPT-5.4 Pro
OpenAI ·
Blog/OpenAI
· 2026-03-05
Pro variant. Near Gemini 3 Deep Think (84.6%). Highest commercial model score.
83.3%
4
Gemini 3.1 Pro
Google ·
Blog/Google
· 2026-02-19
Verified from Google blog. More than 2x Gemini 3 Pro reasoning performance.
77.1%
5
GPT-5.4
OpenAI ·
Blog/OpenAI
· 2026-03-05
Up from 52.9% on GPT-5.2. Large jump in abstract reasoning.
73.3%
6
Muse Spark
Meta ·
Blog/Meta
· 2026-04-08
Below frontier models (Claude Opus 4.6 at 85%, GPT-5.4 at 73.3%).
42.5%