benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
LiveCodeBench leaderboard
LiveCodeBench
28 models tested · Updated 2026-04-03 · Verified sources only
Arcee Trinity
leads at
98.2%
1
Arcee Trinity
Arcee AI ·
Blog/Arcee AI
· 2026-04-03
399B sparse MoE (13B active), Apache 2.0. Impressive score but one user reports 60% failure rate on production constraint-following tasks. Self-reported by Arcee.
98.2%
2
Gemini 3 Pro Preview
Google ·
Leaderboard/LiveCodeBench
· 2025-11-18
Current #1 on LiveCodeBench leaderboard. Ahead of DeepSeek V3.2 Speciale (88.7).
91.7%
3
DeepSeek V3.2 Speciale
DeepSeek ·
Paper/DeepSeek
· 2025-12-01
Pass@1 with chain-of-thought. Temporary API-only reasoning variant. Gold-level results in IMO, ICPC World Finals, IOI 2025.
88.7%
4
DeepSeek-V3.2-Speciale
DeepSeek ·
Paper/DeepSeek-V3.2
· 2025-12-01
27k thinking tokens. Extended reasoning variant.
88.7%
5
Seed 2.0 Pro
ByteDance ·
Blog/ByteDance
· 2026-02-14
Leads LiveCodeBench v6 at time of release. ByteDance frontier model with ICPC/IMO/CMO gold medals.
87.8%
6
Qwen 3.6 Plus
Alibaba ·
Blog/Qwen
· 2026-04-02
Top-4 on LiveCodeBench leaderboard. Strong agentic coding model with 1M context window.
87.1%
7
Step-3.5-Flash
StepFun ·
Blog/StepFun
· 2026-02-12
Reaches 88.9 with PaCoRe. Open-weight model competitive with frontier closed models on live coding.
86.4%
8
Kimi K2.5
Moonshot AI ·
HuggingFace/moonshotai
· 2026-01-27
LiveCodeBench v6, 64k token budget. Highest open-source score on this benchmark. Averaged over 5 runs.
85.0%
9
GLM-4.7
Zhipu AI ·
NVIDIA/Zhipu Official
· 2025-12-22
LiveCodeBench v6. Highest open-weight LiveCodeBench score, beats most proprietary models.
84.9%
10
Qwen 3.5 397B
Alibaba ·
HuggingFace/Qwen
· 2026-02-16
LiveCodeBench v6. Outperforms GPT-5.2 and Claude Opus 4.5 on 80% of evaluated categories.
83.6%
11
DeepSeek-V3.2
DeepSeek ·
Paper/DeepSeek-V3.2
· 2025-12-01
Pass@1 with chain-of-thought. Open-weight.
83.3%
12
Seed 2.0 Lite
ByteDance ·
Blog/ByteDance
· 2026-03-10
Strong competitive coding despite being the mid-tier variant.
81.7%
13
Qwen3.5 27B
Alibaba ·
Web/Alibaba
· 2026-02-16
Dense 27B model, all params active. Matches GPT-5-mini on SWE-bench. Best coding score in Qwen 3.5 medium lineup.
80.7%
14
Gemma 4 31B
Google ·
Model Card/Google
· 2026-04-02
Official Google model card. 31B dense model on LiveCodeBench v6.
80.0%
15
Gemma 4 26B A4B
Google ·
Model Card/Google
· 2026-04-02
MoE variant. v6. Dense 31B scores 80.0%.
77.1%
16
Claude Opus 4.6
Anthropic ·
Blog/Anthropic
· 2026-02-05
Competitive coding on real-world problems. Behind Gemini 3 Pro Preview (91.7%) and DeepSeek V3.2 Speciale (88.7%).
76.0%
17
DeepSeek V3.2
DeepSeek ·
HuggingFace/DeepSeek
· 2025-12-01
Standard (non-thinking) mode. Thinking mode reaches 83.3%. Speciale variant hits 88.7%.
74.1%
18
Gemini 3.1 Flash-Lite
Google ·
Blog/Google
· 2026-03-03
Budget-tier model at $0.25/1M input tokens. Competitive coding despite being Google fastest/cheapest model.
72.0%
19
Sarvam 105B
Sarvam AI ·
HuggingFace/sarvamai
· 2026-03-06
India's first domestically-trained 105B model. MoE with 10.3B active params. Apache 2.0.
71.7%
20
Nemotron 3 Nano 30B
NVIDIA ·
HuggingFace/NVIDIA
· 2025-12-15
v6. Outperforms Qwen3 (66.0%) and GPT-OSS (61.0%) at same size class.
68.3%
21
Qwen 3.5 9B
Alibaba ·
HuggingFace/Qwen
· 2026-03-02
9B params. Competitive coding for a small model, though lags behind larger models (GPT-OSS-120B: 82.7).
65.6%
22
JoyAI-LLM Flash
JD.com ·
X/NoPeople404
· 2026-04-06
48B MoE, 2.7B active params. Surpasses GLM-4.7-Flash-Thinking by 2.4% with 85% fewer tokens. Uses FiberPO RL algorithm.
65.6%
23
GLM-4.7-Flash
Zhipu AI ·
HuggingFace/Zhipu
· 2026-01-15
v6 benchmark. Decent coding for a 3.6B active param model.
64.0%
24
Qwen3.5-4B
Alibaba ·
HuggingFace/Alibaba
· 2026-03-02
Competitive coding ability for 4B params. Open-weight, Apache 2.0.
55.8%
25
Gemma 4 4B
Google ·
Google Model Card
· 2026-04-02
Competitive coding for a 4B model. Runs on consumer edge hardware.
52.0%
26
Gemma 4 E2B
Google ·
Model Card/Google
· 2026-04-02
v6. Effective 2B params. Surpasses Gemma 3 27B on coding despite 10x fewer active parameters.
44.0%
27
Llama 4 Maverick
Meta ·
HuggingFace/Meta
· 2026-04-05
pass@1 on LiveCodeBench (10/01/2024-02/01/2025). Far behind frontier models like Gemma 4 31B (80%) on coding.
43.4%
28
Llama 4 Scout
Meta ·
HuggingFace/Meta
· 2026-04-05
pass@1. Significantly behind frontier coding models.
32.8%