"When you actually prompt Mythos and ask it to distinguish tests from non-tests, it can answer correctly 78% of the time [about the same as Opus 4.6]. So the model can tell the difference between when it is being evaluated and when it is not being evaluated with high accuracy."
Rob Wiblin
Host, 80,000 Hours podcast
Claude Mythos Preview