benchmark
.
space
benchmarks
rankings
compare
voices
transcripts
papers
articles
YouTube · 2026-04-09
"For science knowledge, Mythos scored 94.5% on GPQA Diamond compared to 91.3 for Opus. On Humanity's Last Exam with tools, 64.7% vs 53.1%. On OSWorld, Opus 4.6 got 72.7% which jumped to 79.6% for Mythos."
AI Daily Brief Host
GPQA Diamond
Claude Mythos Preview
view original source →
all researcher takes →