NVIDIA’s New AI Just Changed Everything

Two Minute Papers

2026-04-07 8min 165,163 views watch on youtube →

Channel: Two Minute Papers

Date: 2026-04-07

Duration: 8min

URL: https://www.youtube.com/watch?v=ZQAz_HrUq68

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

📝 The #NVIDIA paper on Nemotron 3 Super is available here:

https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf

https://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/

Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who

0:00

Remember that most AI systems are proprietary,

we have to pay a subscription for them, and no one knows how they work or

what data they were trained on? Well, now hold on to your papers Fellow

Scholars and check out this incredible work, and when I first saw it, my jaw hit the floor.

They absolutely knocked it out of the park. They spilled all the secrets. This is an AI

assistant that is free for all of us forever, but not just the model itself. They also

gave us a 51-page research paper which might be the holy bible of creating

such a system for now. Why is that? Well, they show us every step

of the way of how it was done, and the dataset it was trained on as well. That is extraordinary. Usually

something is always missing. Not here. They call it Nemotron 3 Super and we are going

to find out whether it is indeed super or not.

1:07

Okay, so in goes 25 trillion

tokens as training data, and out comes a 120 billion parameter

AI assistant that is how smart exactly? It roughly matches the best closed frontier

models from about a year and a half ago. Note that those models cost billions

of dollars to train and every detail about them was kept in secret.

And now, we just get this kind of stuff for free. That is mind blowing. This is

amazing for us, consumers and Fellow Scholars. So as you see, it is really smart. Up with some

of the best open models out there in most tests, but note that it’s still a bit behind some

areas. Here’s something that surprised me: in this result, they showcase

two versions of the new model, BF16 and NVFP4. They perform roughly the same in

terms of accuracy, so why the big fuss about this?

2:10

Well, look at this. Holy mother of papers.

Wow. Well, the NVFP4 version is about 3.5 times faster than their other model, and it is up to

7 times faster than similarly smart open models. So the story is not just the similarly smart part, the story is that it is 7 times faster

while it is similarly smart. Goodness. Okay, so how on Earth did they

do that? So here are 4 secrets they gave us from the paper, in very simple words. Dear Fellow Scholars, this is Two Minute

Papers with Dr. Károly Zsolnai-Fehér. Okay, NVFP4. What is that? This is a way for

speeding up the AI to run a great deal faster by essentially compressing the mathematics it

uses. Seeing a long number and rounding off a few digits. You get a smaller format.

Less work! What’s wrong with that? Well,

3:18

everything. Normally, if you do that,

you lose too much accuracy and the system will output nonsense. However,

here, scientists did it the smart way: they left the most sensitive calculations

alone, and did this rounding for the rest, where it does not cause trouble. The result

is that it runs up to 7 times faster than many other techniques. And we saw that it gives

us no meaningful loss in accuracy. Magic. But there is more magic. When other

AI techniques write their answer, they write it token by token. Let’s simplify

by saying word by word. Writing one word at a time. But not this one. This one

calculates several future words at once. A whole sentence! Almost. Specifically,

7 tokens. And then the system verifies the 7 tokens in one go. Another massive speed

up. They call it multi-token prediction.

4:22

But why stop there? Let’s add even more magic! They showcased these weird things they call

the mamba layers. What do these do? Well, traditional AI systems have a bit of a

memory problem. They work like a student who constantly re-reads the textbook over and

over again when they are given a question. Scientists at NVIDIA say, that’s not the

way to go. Memory is precious. So instead, read the book only once, and take highly

compressed notes. So this kind of memory remembers important details about the

conversation. However, it is smart enough to throw away the filler words. Thus, this system

can process massive amounts of data efficiently. All this sounds glorious, but this still does

not give us a working system. Why is that? Well, this is why. You see that there is a lot

of addition here? That is the problem. The AI

5:23

generates your answer step by step, and because

we rounded off the numbers, there is a little error. That’s not a problem. Here’s the problem.

There are many steps, and the error is magnified through each step. Imagine trying to walk to your

car, which is a 100 steps away, but you feel a bit sluggish today and every single one of your steps

is a bit smaller than it was before. What’s the result? Well, of course, after a 100 steps,

you are still really far away from your car! So what is the solution? Well, scientists solved

this by adding back some random noise in the system. But wait, this noise is carefully

crafted in a way that it averages to zero. So your new steps are sometimes smaller,

and sometimes bigger than they used to be, but if you average them out, over a 100

steps, you will be exactly at your car.

6:24

So good! They call this stochastic

rounding and it is a genius idea. Now, not even this technique is perfect.

For instance, when I give it my favorite question about assembling robotic cows,

with lots of math, I like this guy a lot, but it thinks for almost an hour to get

me an answer for that one. That’s a lot. So if I have workloads like that, I like

to run it on a much faster Lambda instance. But still I think the AI game has suddenly

changed. Closed systems used to dominate. Now, not anymore. It seems to me that Jensen

at NVIDIA is not playing games here. It’s in the news that they are going to invest tens

of billions of dollars into fully open systems like this. I am not a money person, I don’t

know how that works exactly, but if we get to own more amazing free AI systems. Well, sign

me up for this one! What a time to be alive!

7:27

And there is just so much more in the paper,

I would definitely love to come back for at least another video on it. Let me know

in the comments if you would like that, and if you enjoyed this,

subscribe, and hit the bell.