ARC-AGI 2 Leaderboard 2026 — Results Across 11 Real AI Models

ARC-AGI 2 leaderboard

ARC-AGI 2

11 models tested · Updated 2026-02-05 · Verified sources only

      Claude Opus 4.6 leads at 85.0%
    

Claude Opus 4.6

Anthropic · Blog/Anthropic · 2026-02-05

Max effort config with 120k thinking budget. Standard config scored 68.8%.

85.0%

GPT-5.5

OpenAI · OpenAI Blog · 2026-04-23

Huge 12pt jump over GPT-5.4 (73.3). Still below Gemini 3.1 Pro (77.1).

85.0%

Gemini 3 Deep Think

Google · Blog/Google · 2026-02-12

ARC Prize Foundation verified. Specialized reasoning mode — highest ARC-AGI-2 score by a wide margin (prev best ~54%). Also scored Codeforces Elo 3455.

84.6%

GPT-5.4 Pro

OpenAI · Blog/OpenAI · 2026-03-05

Pro variant. Near Gemini 3 Deep Think (84.6%). Highest commercial model score.

83.3%

Gemini 3.1 Pro

Google · Blog/Google · 2026-02-19

Verified from Google blog. More than 2x Gemini 3 Pro reasoning performance.

77.1%

Claude Opus 4.7

Anthropic · Blog/OpenAI · 2026-04-16

OpenAI-tested. Lags GPT-5.5 by ~10pp on ARC-AGI-2.

75.8%

GPT-5.4

OpenAI · Blog/OpenAI · 2026-03-05

Up from 52.9% on GPT-5.2. Large jump in abstract reasoning.

73.3%

Gemini 3.5 Flash

Google DeepMind · Blog/Google DeepMind · 2026-06-10

Below GPT-5.5 (84.6) and Gemini 3.1 Pro (77.1). Up from Gemini 3 Flash (33.6).

72.1%

Muse Spark

Meta · Blog/Meta · 2026-04-08

Below frontier models (Claude Opus 4.6 at 85%, GPT-5.4 at 73.3%).

42.5%

Gemini 3 Flash

Google DeepMind · Blog/Google DeepMind · 2026-06-10

Below Gemini 3.5 Flash (72.1). Big gain for new Flash.

33.6%

Grok 4

xAI · Blog/xAI · 2025-07-09

Nearly double Opus ~8.6%. New SOTA for closed models on ARC-AGI v2.

15.9%