benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
articles
SWE-bench Multilingual leaderboard
SWE-bench Multilingual
3 models tested · Updated 2026-04-07 · Verified sources only
Claude Mythos Preview
leads at
87.3%
1
Claude Mythos Preview
Anthropic ·
Blog/Anthropic
· 2026-04-07
Tests code across multiple programming languages. 9.5 pts ahead of Opus 4.6 (77.8%).
87.3%
2
Claude Opus 4.6
Anthropic ·
Blog/Anthropic
· 2026-04-07
Comparison score from Mythos Glasswing announcement.
77.8%
3
MiniMax M2.7
MiniMax ·
Blog/MiniMax
· 2026-03-18
Strong multilingual coding. Outperforms many larger models on real-world engineering tasks.
76.5%