Matthew Berman on Claude Mythos Preview

YouTube · 2026-04-08

"Terminal-bench 2.0. Testing a model ability to control the terminal. Opus 4.6 54. Mythos preview 82%. SWE-bench multimodal 27 for Opus, 59 for Mythos preview. And then SWE-bench Verified, Opus 80, Mythos 94. Huge gains."

Matthew Berman

AI YouTube commentator

SWE-bench Verified Claude Mythos Preview

view original source → all researcher takes →