"With tools enabled, Muse's score only jumped to 50.4 on Humanity's Last Exam, leaving it trailing all three of those major by a few points. This could suggest the model isn't as good at web search or tool use as the others."
AI Daily Brief Host
Host, The AI Daily Brief
Humanity's Last ExamMuse Spark