"SWE-bench Verified 93.9%, way above Claude Opus 4.6, Gemini 3.1 Pro. SWE-bench Pro, same thing, above GPT-5.4. For each one of these software engineering tasks, the leap is pretty massive."
Wes Roth
AI YouTube commentator
SWE-bench VerifiedClaude Mythos Preview