benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-04-09
"On SWE-bench Pro, Opus 4.6 scored 53.4%. Mythos preview got 77.8%. On Terminal Bench 2.0, Opus had 65.4% while Mythos has 82%. On SWE-bench Verified, the jump is from 80.8% to 93.9%."
AI Daily Brief Host
SWE-bench Verified
Claude Mythos Preview
view original source →
all researcher takes →