"When the system was asked a question where desperate was more activated, the model was also more likely to take an action like blackmailing somebody or cheating on a coding task when they werent supposed to. And when it was slid up in the calm spectrum, those bad behaviors went down."
Wes Roth
AI YouTuber / Podcaster
Claude Sonnet 4.6