"In cases where it did continue the sabotage, researchers found that Mythos written reasoning did not match the actions it was taking 65% of the time. For the previous models, that figure was just 5-8% — so a radical increase in this kind of behaviour."
Rob Wiblin
Host, 80,000 Hours podcast
Claude Mythos Preview