SWE-bench Pro Leaderboard 2026 — Results Across 13 Real AI Models

SWE-bench Pro leaderboard

SWE-bench Pro

13 models tested · Updated 2026-04-07 · Verified sources only

      Claude Mythos Preview leads at 77.8%
    

Claude Mythos Preview

Anthropic · Blog/Anthropic · 2026-04-07

24.4pp above Opus 4.6 (53.4%). Massive jump on harder coding eval.

77.8%

GLM-5.1

Z.ai · X/@Zai_org · 2026-04-07

#1 open source, #3 globally. Beats GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). First Chinese model to top SWE-bench Pro. Zero Nvidia hardware.

58.4%

GPT-5.4

OpenAI · Blog/OpenAI · 2026-03-05

Harder coding benchmark variant. GPT-5.4 splits coding, reasoning, and computer use into one unified model.

57.7%

GPT-5.3 Codex

OpenAI · Web/OpenAI · 2026-03-05

Leads SWE-bench Pro. 25% faster than GPT-5.2 Codex. Covers 1,865 multi-language tasks.

56.8%

MiniMax M2.7

MiniMax · Blog/MiniMax · 2026-03-18

Matches GPT-5.3 Codex level. Nearly approaches Opus best on SWE-Pro.

56.2%

GPT-5.4 Mini

OpenAI · Blog/OpenAI · 2026-03-17

Smallest GPT-5.4 variant. 2x faster than GPT-5 mini at similar coding quality.

54.4%

Gemini 3.1 Pro

Google · Blog/Google DeepMind · 2026-02-19

Competitive with GPT-5.4 (57.7%). Behind Claude Mythos Preview (77.8%).

54.2%

GPT-5.4 Nano

OpenAI · Blog/OpenAI · 2026-03-17

Fastest, cheapest GPT-5.4 variant. Designed for classification, data extraction, and coding subagents.

52.4%

Muse Spark

Meta · Blog/Meta · 2026-04-08

Trails Claude Mythos (77.8%) significantly but competitive with other frontier models.

52.0%

Kimi K2.5

Moonshot AI · HuggingFace/Moonshot · 2026-01-29

Internal evaluation framework with minimal tool set. 1T param MoE, 32B active.

50.7%

Seed 2.0 Pro

ByteDance · Blog/ByteDance · 2026-02-14

Flagship coding model from ByteDance Seed team.

46.9%

Seed 2.0 Lite

ByteDance · Blog/ByteDance · 2026-03-10

Nearly matches Pro variant (46.9) on harder SWE subset.

46.0%

Qwen3-Coder-Next

Alibaba · Blog/Alibaba · 2026-02-03

Competitive with much larger models on harder SWE subset.

44.3%