❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
📝 The #NVIDIA paper on Nemotron 3 Super is available here:
https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf
https://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/
Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers
🙏 We would like to thank our generous Patreon supporters who
Remember that most AI systems are proprietary,
we have to pay a subscription for them, and no one knows how they work or
what data they were trained on? Well, now hold on to your papers Fellow
Scholars and check out this incredible work, and when I first saw it, my jaw hit the floor.
They absolutely knocked it out of the park. They spilled all the secrets. This is an AI
assistant that is free for all of us forever, but not just the model itself. They also
gave us a 51-page research paper which might be the holy bible of creating
such a system for now. Why is that? Well, they show us every step
of the way of how it was done, and the dataset it was trained on as well. That is extraordinary. Usually
something is always missing. Not here. They call it Nemotron 3 Super and we are going
to find out whether it is indeed super or not.
Okay, so in goes 25 trillion
tokens as training data, and out comes a 120 billion parameter
AI assistant that is how smart exactly? It roughly matches the best closed frontier
models from about a year and a half ago. Note that those models cost billions
of dollars to train and every detail about them was kept in secret.
And now, we just get this kind of stuff for free. That is mind blowing. This is
amazing for us, consumers and Fellow Scholars. So as you see, it is really smart. Up with some
of the best open models out there in most tests, but note that it’s still a bit behind some
areas. Here’s something that surprised me: in this result, they showcase
two versions of the new model, BF16 and NVFP4. They perform roughly the same in
terms of accuracy, so why the big fuss about this?
Well, look at this. Holy mother of papers.
Wow. Well, the NVFP4 version is about 3.5 times faster than their other model, and it is up to
7 times faster than similarly smart open models. So the story is not just the similarly smart part, the story is that it is 7 times faster
while it is similarly smart. Goodness. Okay, so how on Earth did they
do that? So here are 4 secrets they gave us from the paper, in very simple words. Dear Fellow Scholars, this is Two Minute
Papers with Dr. Károly Zsolnai-Fehér. Okay, NVFP4. What is that? This is a way for
speeding up the AI to run a great deal faster by essentially compressing the mathematics it
uses. Seeing a long number and rounding off a few digits. You get a smaller format.
Less work! What’s wrong with that? Well,
everything. Normally, if you do that,
you lose too much accuracy and the system will output nonsense. However,
here, scientists did it the smart way: they left the most sensitive calculations
alone, and did this rounding for the rest, where it does not cause trouble. The result
is that it runs up to 7 times faster than many other techniques. And we saw that it gives
us no meaningful loss in accuracy. Magic. But there is more magic. When other
AI techniques write their answer, they write it token by token. Let’s simplify
by saying word by word. Writing one word at a time. But not this one. This one
calculates several future words at once. A whole sentence! Almost. Specifically,
7 tokens. And then the system verifies the 7 tokens in one go. Another massive speed
up. They call it multi-token prediction.
But why stop there? Let’s add even more magic! They showcased these weird things they call
the mamba layers. What do these do? Well, traditional AI systems have a bit of a
memory problem. They work like a student who constantly re-reads the textbook over and
over again when they are given a question. Scientists at NVIDIA say, that’s not the
way to go. Memory is precious. So instead, read the book only once, and take highly
compressed notes. So this kind of memory remembers important details about the
conversation. However, it is smart enough to throw away the filler words. Thus, this system
can process massive amounts of data efficiently. All this sounds glorious, but this still does
not give us a working system. Why is that? Well, this is why. You see that there is a lot
of addition here? That is the problem. The AI
generates your answer step by step, and because
we rounded off the numbers, there is a little error. That’s not a problem. Here’s the problem.
There are many steps, and the error is magnified through each step. Imagine trying to walk to your
car, which is a 100 steps away, but you feel a bit sluggish today and every single one of your steps
is a bit smaller than it was before. What’s the result? Well, of course, after a 100 steps,
you are still really far away from your car! So what is the solution? Well, scientists solved
this by adding back some random noise in the system. But wait, this noise is carefully
crafted in a way that it averages to zero. So your new steps are sometimes smaller,
and sometimes bigger than they used to be, but if you average them out, over a 100
steps, you will be exactly at your car.
So good! They call this stochastic
rounding and it is a genius idea. Now, not even this technique is perfect.
For instance, when I give it my favorite question about assembling robotic cows,
with lots of math, I like this guy a lot, but it thinks for almost an hour to get
me an answer for that one. That’s a lot. So if I have workloads like that, I like
to run it on a much faster Lambda instance. But still I think the AI game has suddenly
changed. Closed systems used to dominate. Now, not anymore. It seems to me that Jensen
at NVIDIA is not playing games here. It’s in the news that they are going to invest tens
of billions of dollars into fully open systems like this. I am not a money person, I don’t
know how that works exactly, but if we get to own more amazing free AI systems. Well, sign
me up for this one! What a time to be alive!
And there is just so much more in the paper,
I would definitely love to come back for at least another video on it. Let me know
in the comments if you would like that, and if you enjoyed this,
subscribe, and hit the bell.