"We've consistently seen these systems have a very odd spikiness and it's actually possible to architect a system that is world-class on some math domain but then you could do some perturbations to the questions and actually degrade it substantially. So it's like a bad high school student."
Liam Fedus
Co-founder Periodic Labs, ex-VP Post-Training OpenAI