"On Terminal-Bench 2.0 which specifically tests autonomous terminals and system level operations, Mythos hit 82% against Opus 65.4%."
Omshri
AI Infra Weekly host
Terminal-Bench 2.0Claude Mythos Preview