SWE-bench Multimodal Leaderboard 2026 — Results Across 3 Real AI Models

SWE-bench Multimodal leaderboard

SWE-bench Multimodal

3 models tested · Updated 2026-04-07 · Verified sources only

      Claude Mythos Preview leads at 59.0%
    

Anthropic · Blog/Anthropic · 2026-04-07

Multimodal SWE tasks. More than doubles Opus 4.6 (27.1%).

59.0%

Anthropic · Blog/Anthropic · 2026-04-16

Up from Opus 4.6 (27.1%). Internal implementation, not comparable to public leaderboard.

35.0%

Anthropic · Blog/Anthropic · 2026-04-07

Comparison score from Mythos Glasswing announcement.

27.1%