"For science knowledge, Mythos scored 94.5% on GPQA Diamond compared to 91.3 for Opus. On Humanity's Last Exam with tools, 64.7% vs 53.1%. On OSWorld, Opus 4.6 got 72.7% which jumped to 79.6% for Mythos."
AI Daily Brief Host
GPQA DiamondClaude Mythos Preview