ARC-AGI 2
6 models tested · Updated 2026-02-05 · Verified sources only
Claude Opus 4.6 leads at 85.0%
1
Anthropic · Blog/Anthropic · 2026-02-05
Max effort config with 120k thinking budget. Standard config scored 68.8%.
85.0%
2
Google · Blog/Google · 2026-02-12
ARC Prize Foundation verified. Specialized reasoning mode — highest ARC-AGI-2 score by a wide margin (prev best ~54%). Also scored Codeforces Elo 3455.
84.6%
3
OpenAI · Blog/OpenAI · 2026-03-05
Pro variant. Near Gemini 3 Deep Think (84.6%). Highest commercial model score.
83.3%
4
Google · Blog/Google · 2026-02-19
Verified from Google blog. More than 2x Gemini 3 Pro reasoning performance.
77.1%
5
OpenAI · Blog/OpenAI · 2026-03-05
Up from 52.9% on GPT-5.2. Large jump in abstract reasoning.
73.3%
6
Meta · Blog/Meta · 2026-04-08
Below frontier models (Claude Opus 4.6 at 85%, GPT-5.4 at 73.3%).
42.5%