❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
📝 The paper is available here:
https://arxiv.org/abs/2602.10177
Source:
https://www.youtube.com/watch?v=6evUpgCHtOQ
Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child,
I appeared on camera for an interview not
so long ago. And I was really surprised by how many of you Fellow Scholars said that
you would like to see more. So first of all, thank you so much to all
of you for the kind words. Second, I thought let's try this and hope that you
will enjoy it. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér.
Look, it only took 1,000 episodes. Now, I have an amazing paper for you because scientists
at DeepMind did something pretty insane. Our question today is can an AI invent something that
is fundamentally new and pushes humanity forward? Well, they said that their new AI agent can
actually do research and even write research papers. Most of the core content anyway.
Is that insane? Well…it’s not. A lot of other people have tried it and the only insane thing
about it was how many poor papers they wrote.
But it turns out… there is levels to this game. You see, I visited the research group that is
behind this work last year. I flew to Mountain View into this crazy lab, and a grumpy
guard didn’t even want to let me in first. Crazy town. So I was very surprised that they
are guarding these secrets and they take them very seriously. What is even more surprising
is that now they give some of those secrets away to all of us for free. Now that
is insane! More on that in a moment. So I talked to these scientists, this was the
research group of Quoc Le. They are brilliant. They wrote an AI that was able to do a gold
medal worthy performance on the mathematical olympiad. This is serious business. Then they
released this technique, anyone who is made out of money bags and pays for the Gemini Advanced
can use it, it is called Deep Think. And now, this AI is even better than that. They call
it Aletheia. Now that, once again is insane.
Okay, so what does it do? Well, it
promises that it does research. It solves novel problems. This is something
that could push humanity forward. Now that is so much harder than the mathematical
olympiad. Why is that? Well, in these contests, you have a not that huge piece of core
knowledge you are supposed to have, and every problem can be guaranteed to
be solved by those small set of tools. Every problem is nice, shiny, and polished. Tough, but polished. You know what is not
polished at all? Real life problems. With these open problems, we don’t even know
if they are solvable at all. Maybe they are impossible, or maybe possible, but not with our
current tools. That’s the point: no one knows. When this technique is given a problem,
the generator starts working on it, creates a candidate solution, and now here is
one of the important parts of the paper. The
verifier. This takes a look, and says, okay bro
this is junk. Start again. This is essentially a filter. You know, that’s actually good life
advice. Sometimes it’s good to have a filter, so you don’t just shoot those hot takes out
there into the ether. Now every now and then, the solution looks pretty good, and could
maybe pass with a few modifications. Then, it gets polished for another round of reviews,
and so it goes. Sounds simple…maybe even trivial right? So what is so scientific about
this? Why doesn’t every system do that? Well, that’s easier said than done. In fact,
it is almost impossible to pull off. Why? One, when the AI is doing something
fundamentally new, unfortunately, hallucinations still happen. Yup.
It just makes stuff up. Fake papers, fictitious authors, you name
it. All kinds of junk comes out.
Two, when you want to compute
1+1 or other simple things, you have tons of training data about it out
there. You can verify that easily. But if you want to do frontier research? There is
no training data on what we don't even know yet. Of course there isn’t! You are trying
to invent things no one understands yet. These two factors make it extremely
difficult to get an AI to do something fundamentally new and useful. So how did
they pull it off? With three key steps. First, Alethia does not use this formal rigid math
language to check its own proofs. It uses natural English language. That is notoriously hard,
because when the AI checks its own writing, it just blindly agrees with it.
We humans do that too! Now here, the researchers found a way to separate the
thinking part from the answer part. So the
messy train of thought is hidden from the
verifier, it cannot trick itself into just blindly agreeing with itself. Brilliant. Our
brains would need something like that too. Then, two they let the computer think
longer. That’s not new. However, they added some optimizations to this, so
much so that the model they have now is just as smart as the one from 6 months ago.
But hold on to your papers Fellow Scholars, because yes, same smarts, but it uses a 100
times less compute. What! Crazy. They trained a much stronger base model which made it
more efficient at reasoning. So this one, even without internet access, beats the
mathematical olympiad gold AI easily. About 65% was improved to 95%. Wow. It went from
a bit better than a coinfip to destroying
the tasks made for some of the best human minds.
All this in just a few months. I am out of words. Now three, they gave the AI the
ability to search for stuff. We are talking about Google after all.
Once again, that is easy. However, getting the AI to read and combine techniques
from dozens and dozens of cutting-edge research papers without losing its mind. Now that is
hard. You saw it earlier, this really happens! They heavily trained this AI to be
able to use these tools and research works that are out there. That was what
finally stopped it from making up junk. Okay, so how good is it? First I saw that
it solved a few of these Erdős problems. It autonomously found the answer to 4 open math
puzzles left behind by a legendary Hungarian mathematician. Is that insane? I asked
a mathematician friend. He told me yeah, that’s pretty good, but there are
so many of these problems out there,
and not a ton of people work on them.
In other words, they are fairly easy, they were just ignored by experts for
years. So not nearly as good as I thought. But then, it stepped up its game and
wrote the core contents of a research paper. On something new. Note that the final
paper is written up by a human scientist. They had one paper on calculating constants
in arithmetic geometry. And then it helped human scientists write 4 other papers, like
finding new limits for interacting particles. So how good are these research works? Well, they are submitted for peer review
and that’s going to take quite a while. So, in the meantime, they had a
bunch of math experts look at it, many of them independent scientists. They
checked it for correctness and novelty, and it checks out man. I think for the first
time ever, an AI created core parts of a research work that is new, it has impact, it is
useful. That is…wow. What a time to be alive!
So I told you there is levels to this
game. So where are we now? Level 0 is negligible novelty work, it can do
that. Level 1 is somewhat novel work, it can do that too. But now, it can help a
person create publishable-level research. That is incredible. But wait, it can also do
that autonomously. An absolute game changer. Levels 3 and 4, those are groundbreaking
works, these are out of reach, but I ask you Fellow Scholars, given the pace
of progress, for how long? For 6 more months? And I think that is something that
needs to be talked about more. Research helping the people
live a better life. Love it. And thank you so much to all of you Fellow
Scholars for watching us over the years. We can only exist because of you Fellow
Scholars. I really hope that you enjoyed this. It allows me to talk about papers
where there is not a lot of visual content, and I really wanted to share this with you. Let
me know in the comments if we should do more.