"On Humanity's Last Exam, Opus got a 40% on a no-tools run compared to Mythos preview's 56.8%. With tools enabled, performance jumped to 64.7% compared to 53.1% for Opus."
AI Daily Brief Host
AI news commentator
Humanity's Last ExamClaude Mythos Preview