"You would be understandably slightly confused to see it being better in all sorts of coding benchmarks, measures of scientific reasoning, and academic reasoning, like GPQA Diamond and Humanity Last Exam respectively, as well as general pattern recognition, ARC-AGI-2."
AI Explained
AI YouTube channel
Humanity Last ExamGemini 3.1 Pro