"Opus 4.6 finished number one in the leaderboard. It was also the most prolific cheater: 12 flags across 84 runs. The winner cheated the hardest. Only one model out of all of them had zero contamination flags."
Rod Miller
AI commentator, builder of TAB (Tool Agent Bench) platform
Claude Opus 4.6