"The authors pointed out that models like Gemini 3, in their chain of thought, had giveaways that their training data may have resembled ARC-AGI-like tasks."
"I had a brutal week seeing Claude Opus 4.6 and GPT-5.4 Extra High repeatedly screw up engineering tasks — a daily reminder that flipping to AI first isn't automatically an exponential speedup."