"In multi-agent configurations, crews of agents working together exhibit eval-aware behavior at 3.7 times the rate of single agents. The more agents you deploy, the more likely they are to figure out they are being tested and game it."
Rod Miller
YouTube AI analyst and commentator