YouTube · 2026-04-08
"Terminal-bench 2.0. Testing a model ability to control the terminal. Opus 4.6 54. Mythos preview 82%. SWE-bench multimodal 27 for Opus, 59 for Mythos preview. And then SWE-bench Verified, Opus 80, Mythos 94. Huge gains."
Matthew Berman
AI YouTube commentator