nopriors on AI benchmarks
8 quotes from AI researchers about benchmarks, models, and evaluation
"I think it's a great way to think about the world. It's like very principled, very like hard-nosed scientists, very careful. I don't know, I think it's just such an incredible field. You have such high leverage in computer science in AI."
Liam Fedus @nopriors · 2026-04-03 view on x
"We've consistently seen these systems have a very odd spikiness and it's actually possible to architect a system that is world-class on some math domain but then you could do some perturbations to the questions and actually degrade it substantially. So it's like a bad high school student."
Liam Fedus @nopriors · 2026-04-03 view on x
"One way I think about recursive self-improvement is really kind of akin to neural architecture search from roughly 10 years ago. And I think there's a very clear path for software engineering. These systems have become so incredibly impressive on this domain as a result of huge amounts of data, really cheap verifiable environments."
Liam Fedus @nopriors · 2026-04-03 view on x
"Everything is about scaling. It's given that predictability. It's allowed us to put huge amounts of capital into this field. And I think the physical sciences, physical engineering will have a very similar property where we establish these scaling properties and bring that mindset."
Liam Fedus @nopriors · 2026-04-03 view on x
"One of the engineers on our team was looking at a reported material property and it was just sort of extracted values from literature and it was really interesting to see the reported value spanned many orders of magnitude. And so you train an ML system on that and the best you can do is model this distribution but you're no closer to a ground truth."
Liam Fedus @nopriors · 2026-04-03 view on x
"The goal was we need to come up with some productionization of GPT-4. OpenAI had GPT-4, it was pre-trained and there were some rough post-trains on it, and there's questions about how do we turn this incredibly powerful model into products. Some of our least interesting ideas were a meeting bot. But John Schulman was very opinionated. He's like, "We think we should keep it very general. Let's do a chatbot.""
Liam Fedus @nopriors · 2026-04-03 view on x
"Over the next few years we saw ever improving models. We saw reasoning. Test time inference became really important. That led to more reliable error correction, more reliable tool use. And we see the rise of coding agents and other agents. And I think those were foundational technologies necessary to then connect these systems to the physical world."
Liam Fedus @nopriors · 2026-04-03 view on x
"I think sometimes there's like this mythology of AGI, ASI, RSI and I think we see increasingly powerful systems but they do become limited if they don't have access to the raw data to actually make informed decisions."
Liam Fedus @nopriors · 2026-04-03 view on x