benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-04-08
"On SWE-bench Verified, Mythos scores at 93.9%, and Claude Opus 4.6, the model you can actually use right now, 80%. On Terminal Bench, Mythos actually hits 82% up from Opus's 65.4%."
TheAIGRID
SWE-bench Verified
Claude Mythos Preview
view original source →
all researcher takes →