LiveCodeBench
28 models tested · Updated 2026-04-03 · Verified sources only
Arcee Trinity leads at 98.2%
1
Arcee AI · Blog/Arcee AI · 2026-04-03
399B sparse MoE (13B active), Apache 2.0. Impressive score but one user reports 60% failure rate on production constraint-following tasks. Self-reported by Arcee.
98.2%
2
Google · Leaderboard/LiveCodeBench · 2025-11-18
Current #1 on LiveCodeBench leaderboard. Ahead of DeepSeek V3.2 Speciale (88.7).
91.7%
3
DeepSeek · Paper/DeepSeek · 2025-12-01
Pass@1 with chain-of-thought. Temporary API-only reasoning variant. Gold-level results in IMO, ICPC World Finals, IOI 2025.
88.7%
4
DeepSeek · Paper/DeepSeek-V3.2 · 2025-12-01
27k thinking tokens. Extended reasoning variant.
88.7%
5
ByteDance · Blog/ByteDance · 2026-02-14
Leads LiveCodeBench v6 at time of release. ByteDance frontier model with ICPC/IMO/CMO gold medals.
87.8%
6
Alibaba · Blog/Qwen · 2026-04-02
Top-4 on LiveCodeBench leaderboard. Strong agentic coding model with 1M context window.
87.1%
7
StepFun · Blog/StepFun · 2026-02-12
Reaches 88.9 with PaCoRe. Open-weight model competitive with frontier closed models on live coding.
86.4%
8
Moonshot AI · HuggingFace/moonshotai · 2026-01-27
LiveCodeBench v6, 64k token budget. Highest open-source score on this benchmark. Averaged over 5 runs.
85.0%
9
Zhipu AI · NVIDIA/Zhipu Official · 2025-12-22
LiveCodeBench v6. Highest open-weight LiveCodeBench score, beats most proprietary models.
84.9%
10
Alibaba · HuggingFace/Qwen · 2026-02-16
LiveCodeBench v6. Outperforms GPT-5.2 and Claude Opus 4.5 on 80% of evaluated categories.
83.6%
11
DeepSeek · Paper/DeepSeek-V3.2 · 2025-12-01
Pass@1 with chain-of-thought. Open-weight.
83.3%
12
ByteDance · Blog/ByteDance · 2026-03-10
Strong competitive coding despite being the mid-tier variant.
81.7%
13
Alibaba · Web/Alibaba · 2026-02-16
Dense 27B model, all params active. Matches GPT-5-mini on SWE-bench. Best coding score in Qwen 3.5 medium lineup.
80.7%
14
Google · Model Card/Google · 2026-04-02
Official Google model card. 31B dense model on LiveCodeBench v6.
80.0%
15
Google · Model Card/Google · 2026-04-02
MoE variant. v6. Dense 31B scores 80.0%.
77.1%
16
Anthropic · Blog/Anthropic · 2026-02-05
Competitive coding on real-world problems. Behind Gemini 3 Pro Preview (91.7%) and DeepSeek V3.2 Speciale (88.7%).
76.0%
17
DeepSeek · HuggingFace/DeepSeek · 2025-12-01
Standard (non-thinking) mode. Thinking mode reaches 83.3%. Speciale variant hits 88.7%.
74.1%
18
Google · Blog/Google · 2026-03-03
Budget-tier model at $0.25/1M input tokens. Competitive coding despite being Google fastest/cheapest model.
72.0%
19
Sarvam AI · HuggingFace/sarvamai · 2026-03-06
India's first domestically-trained 105B model. MoE with 10.3B active params. Apache 2.0.
71.7%
20
NVIDIA · HuggingFace/NVIDIA · 2025-12-15
v6. Outperforms Qwen3 (66.0%) and GPT-OSS (61.0%) at same size class.
68.3%
21
Alibaba · HuggingFace/Qwen · 2026-03-02
9B params. Competitive coding for a small model, though lags behind larger models (GPT-OSS-120B: 82.7).
65.6%
22
JD.com · X/NoPeople404 · 2026-04-06
48B MoE, 2.7B active params. Surpasses GLM-4.7-Flash-Thinking by 2.4% with 85% fewer tokens. Uses FiberPO RL algorithm.
65.6%
23
Zhipu AI · HuggingFace/Zhipu · 2026-01-15
v6 benchmark. Decent coding for a 3.6B active param model.
64.0%
24
Alibaba · HuggingFace/Alibaba · 2026-03-02
Competitive coding ability for 4B params. Open-weight, Apache 2.0.
55.8%
25
Google · Google Model Card · 2026-04-02
Competitive coding for a 4B model. Runs on consumer edge hardware.
52.0%
26
Google · Model Card/Google · 2026-04-02
v6. Effective 2B params. Surpasses Gemma 3 27B on coding despite 10x fewer active parameters.
44.0%
27
Meta · HuggingFace/Meta · 2026-04-05
pass@1 on LiveCodeBench (10/01/2024-02/01/2025). Far behind frontier models like Gemma 4 31B (80%) on coding.
43.4%
28
Meta · HuggingFace/Meta · 2026-04-05
pass@1. Significantly behind frontier coding models.
32.8%