SWE-bench Multilingual
3 models tested · Updated 2026-04-07 · Verified sources only
Claude Mythos Preview leads at 87.3%
1
Anthropic · Blog/Anthropic · 2026-04-07
Tests code across multiple programming languages. 9.5 pts ahead of Opus 4.6 (77.8%).
87.3%
2
Anthropic · Blog/Anthropic · 2026-04-07
Comparison score from Mythos Glasswing announcement.
77.8%
3
MiniMax · Blog/MiniMax · 2026-03-18
Strong multilingual coding. Outperforms many larger models on real-world engineering tasks.
76.5%