"Anthropic ran the benchmark again using improvements from Terminal Bench 2.1 and extending the timeout window to 4 hours. Under those conditions, Mythos scored not 82% but 92.1%."
AI Daily Brief Host
AI news commentator
Terminal-Bench 2.0Claude Mythos Preview