LiveCodeBench Leaderboard 2026 — Results Across 28 Real AI Models

LiveCodeBench leaderboard

LiveCodeBench

28 models tested · Updated 2026-04-03 · Verified sources only

      Arcee Trinity leads at 98.2%
    

Arcee Trinity

Arcee AI · Blog/Arcee AI · 2026-04-03

399B sparse MoE (13B active), Apache 2.0. Impressive score but one user reports 60% failure rate on production constraint-following tasks. Self-reported by Arcee.

98.2%

Gemini 3 Pro Preview

Google · Leaderboard/LiveCodeBench · 2025-11-18

Current #1 on LiveCodeBench leaderboard. Ahead of DeepSeek V3.2 Speciale (88.7).

91.7%

DeepSeek V3.2 Speciale

DeepSeek · Paper/DeepSeek · 2025-12-01

Pass@1 with chain-of-thought. Temporary API-only reasoning variant. Gold-level results in IMO, ICPC World Finals, IOI 2025.

88.7%

DeepSeek-V3.2-Speciale

DeepSeek · Paper/DeepSeek-V3.2 · 2025-12-01

27k thinking tokens. Extended reasoning variant.

88.7%

Seed 2.0 Pro

ByteDance · Blog/ByteDance · 2026-02-14

Leads LiveCodeBench v6 at time of release. ByteDance frontier model with ICPC/IMO/CMO gold medals.

87.8%

Qwen 3.6 Plus

Alibaba · Blog/Qwen · 2026-04-02

Top-4 on LiveCodeBench leaderboard. Strong agentic coding model with 1M context window.

87.1%

Step-3.5-Flash

StepFun · Blog/StepFun · 2026-02-12

Reaches 88.9 with PaCoRe. Open-weight model competitive with frontier closed models on live coding.

86.4%

Kimi K2.5

Moonshot AI · HuggingFace/moonshotai · 2026-01-27

LiveCodeBench v6, 64k token budget. Highest open-source score on this benchmark. Averaged over 5 runs.

85.0%

GLM-4.7

Zhipu AI · NVIDIA/Zhipu Official · 2025-12-22

LiveCodeBench v6. Highest open-weight LiveCodeBench score, beats most proprietary models.

84.9%

Qwen 3.5 397B

Alibaba · HuggingFace/Qwen · 2026-02-16

LiveCodeBench v6. Outperforms GPT-5.2 and Claude Opus 4.5 on 80% of evaluated categories.

83.6%

DeepSeek-V3.2

DeepSeek · Paper/DeepSeek-V3.2 · 2025-12-01

Pass@1 with chain-of-thought. Open-weight.

83.3%

Seed 2.0 Lite

ByteDance · Blog/ByteDance · 2026-03-10

Strong competitive coding despite being the mid-tier variant.

81.7%

Qwen3.5 27B

Alibaba · Web/Alibaba · 2026-02-16

Dense 27B model, all params active. Matches GPT-5-mini on SWE-bench. Best coding score in Qwen 3.5 medium lineup.

80.7%

Gemma 4 31B

Google · Model Card/Google · 2026-04-02

Official Google model card. 31B dense model on LiveCodeBench v6.

80.0%

Gemma 4 26B A4B

Google · Model Card/Google · 2026-04-02

MoE variant. v6. Dense 31B scores 80.0%.

77.1%

Claude Opus 4.6

Anthropic · Blog/Anthropic · 2026-02-05

Competitive coding on real-world problems. Behind Gemini 3 Pro Preview (91.7%) and DeepSeek V3.2 Speciale (88.7%).

76.0%

DeepSeek V3.2

DeepSeek · HuggingFace/DeepSeek · 2025-12-01

Standard (non-thinking) mode. Thinking mode reaches 83.3%. Speciale variant hits 88.7%.

74.1%

Gemini 3.1 Flash-Lite

Google · Blog/Google · 2026-03-03

Budget-tier model at $0.25/1M input tokens. Competitive coding despite being Google fastest/cheapest model.

72.0%

Sarvam 105B

Sarvam AI · HuggingFace/sarvamai · 2026-03-06

India's first domestically-trained 105B model. MoE with 10.3B active params. Apache 2.0.

71.7%

Nemotron 3 Nano 30B

NVIDIA · HuggingFace/NVIDIA · 2025-12-15

v6. Outperforms Qwen3 (66.0%) and GPT-OSS (61.0%) at same size class.

68.3%

Qwen 3.5 9B

Alibaba · HuggingFace/Qwen · 2026-03-02

9B params. Competitive coding for a small model, though lags behind larger models (GPT-OSS-120B: 82.7).

65.6%

JoyAI-LLM Flash

JD.com · X/NoPeople404 · 2026-04-06

48B MoE, 2.7B active params. Surpasses GLM-4.7-Flash-Thinking by 2.4% with 85% fewer tokens. Uses FiberPO RL algorithm.

65.6%

GLM-4.7-Flash

Zhipu AI · HuggingFace/Zhipu · 2026-01-15

v6 benchmark. Decent coding for a 3.6B active param model.

64.0%

Qwen3.5-4B

Alibaba · HuggingFace/Alibaba · 2026-03-02

Competitive coding ability for 4B params. Open-weight, Apache 2.0.

55.8%

Gemma 4 4B

Google · Google Model Card · 2026-04-02

Competitive coding for a 4B model. Runs on consumer edge hardware.

52.0%

Gemma 4 E2B

Google · Model Card/Google · 2026-04-02

v6. Effective 2B params. Surpasses Gemma 3 27B on coding despite 10x fewer active parameters.

44.0%

Llama 4 Maverick

Meta · HuggingFace/Meta · 2026-04-05

pass@1 on LiveCodeBench (10/01/2024-02/01/2025). Far behind frontier models like Gemma 4 31B (80%) on coding.

43.4%

Llama 4 Scout

Meta · HuggingFace/Meta · 2026-04-05

pass@1. Significantly behind frontier coding models.

32.8%