benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-04-09
"On Humanity's Last Exam, Opus got a 40% on a no-tools run compared to Mythos preview's 56.8%. With tools enabled, performance jumped to 64.7% compared to 53.1% for Opus."
AI Daily Brief Host
AI news commentator
Humanity's Last Exam
Claude Mythos Preview
view original source →
all researcher takes →