SWE-bench Multimodal
2 models tested · Updated 2026-04-07 · Verified sources only
Claude Mythos Preview leads at 59.0%
1
Anthropic · Blog/Anthropic · 2026-04-07
Multimodal SWE tasks. More than doubles Opus 4.6 (27.1%).
59.0%
2
Anthropic · Blog/Anthropic · 2026-04-07
Comparison score from Mythos Glasswing announcement.
27.1%