DeepSeek Just Fixed One Of The Biggest Problems With AI

Two Minute Papers

2026-03-24 9min 164,708 views watch on youtube →

Channel: Two Minute Papers

Date: 2026-03-24

Duration: 9min

URL: https://www.youtube.com/watch?v=DmtoVnTkQnM

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

📝 The #DeepSeek paper is available here:

https://github.com/deepseek-ai/Engram

https://arxiv.org/abs/2601.07372

Larry Wheels comment in this video:

https://www.youtube.com/watch?v=7SM816P5G9s&lc=Ugz7yiDrr_8YD7w8gaN4AaABAg

Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benj

0:00

Few people know, but modern AI systems are

really silly. How? Well, imagine having a Michelin-star chef being asked for a simple peanut

butter sandwich. That’s weird, but okay. Now, the chef says, you’ll need to wait just a bit,

because I am going to start planting peanuts, wait six months, harvest, churn some peanut

butter, and then get to work on your bread. That sounds really silly, and that is exactly

what modern AI systems like ChatGPT and Gemini do. When they need to recall a simple

fact, like who Alexander the Great was, something crazy happens. They

go through complex reasoning layers and reconstruct everything from

scratch every single time. That is crazy. Now I have an amazing research paper for you

here from folks at DeepSeek AI, and this is a piece of technology that might underpin most if

not all of the amazing AI systems of the future.

1:02

Now every now and then we are going to look at a

figure, but for the rest, I am going to bring my physics simulations and all the goodness we

talk about around here. Apologies for that. Okay, so this is a massive waste of

compute. But why does this happen? Well, standard transformers are a kind of neural

network that is inside nearly all modern AI assistants. And here is the problem: they lack

a simple and cheap way to just look things up. Whatever the question is, the answer is a huge

bunch of dense mathematical calculations. From scratch. Yes, it is literally planting

that peanut when you ask for a sandwich. Now in this work, DeepSeek introduces

Engram. With this, they are giving our tired little chef a pantry. Cutting

edge technology brother! Instead of growing that peanut butter sandwich

from scratch, it now just grabs the ingredients from the pantry. I’ll explain

to you how exactly they did it in a bit.

2:06

Now, this makes the AI way more efficient, okay,

I expected that. But, what? Are you seeing what I am seeing? Now this I did not expect at all.

So here comes the surprise. Now hold on to your papers Fellow Scholars, because when taking away

some of the AI's complex reasoning parts, known as mixture of experts, MoE. Then, replacing it with

the pantry actually makes the AI smarter. Lower is better here on the loss curves. And not just a

little, this is significantly smarter. Those dots dipping way down show that this hybrid chef makes

far fewer mistakes than previous techniques. It achieves a perfect balance of active cooking

and just grabbing from the pantry. Genius. But that is not the only surprising thing in this

paper. They also added a way for the AI to check the ingredients before using them. You don’t

want rotting fish in your strawberry jam. To

3:11

ensure this, they created a context-aware gating

mechanism. The current context is the dish being cooked. Now here, this is compared against the

retrieved memory, the jar from the pantry. If the jar's contents don’t agree with the dish,

the gate drops to zero, throwing the ingredient away. Bye bye rotting fish! This mechanism lives

right here, inside this jolly little dot product. Now let’s see how it actually performs against

the current systems. I’ll tell you exactly what is going to happen now. What happens in nearly

all research papers with something new. It does something, it is compared to previous

methods, and it’s better at some things, worse at others. And then you sit down and you

do your analysis. Okay, let’s see…wait what? What just happened here? The new engram technique

makes the neural network better…everywhere.

4:13

Absolutely everything is measurably

better. This is an absolute miracle work. The engram model is actually better

on every single benchmark compared to the previous techniques. It is better everywhere! Now this is an amazing life lesson too. How? Well, essentially what DeepSeek does is

automates the easy part, and focuses on the more difficult tasks. No wonder it

works so well! What a time to be alive! We can learn so much from these research papers,

and not just about AI, but about life itself. Okay, now I’ll tell you how this works, and

it turns out, there are more surprises ahead. Dear Fellow Scholars, this is Two Minute

Papers with Dr. Károly Zsolnai-Fehér. Okay, so how does it do this magic?

Well, it uses what they call n-gram embeddings combined with multi-head hashing.

Okay, what the heck does that mean? Well, in the kitchen, the chef looks at the order

ticket, sees a 3-word phrase, and instantly

5:20

knows exactly which shelf in the pantry has

the premade sauce, and grabs it quickly. And I think this also shows us that there are

simple and basic ideas in AI that we haven’t found yet. I mean, this thing is basically

a look up table. It is as simple as it gets, and it makes everything more efficient and

better across the board. Just think about it: we removed 20 or 25% of the smart

experts in this little virtual brain, put a spreadsheet there, and it

got better! I mean what? Crazy. And I love how we have a little

better understanding of the AI system itself. Usually, no one

knows what is going on inside, but here. Look. When they switched off

the engram memory during testing, the AI’s ability to answer trivia went down 70%. But

its reading comprehension remained at 93%. Why?

6:21

Well I think this shows that AI split its brain,

and it’s using the new part just to store facts. Just think about it. When they locked the

pantry door during testing his ability to understand a recipe stayed at a massive

93%! What does that mean? It shows the chef split the work perfectly. He used

the pantry strictly as a storage shelf for memorized ingredients, but he

can still cook an amazing meal. I think this is going to lead to even

cheaper and even smarter AI systems, and this will be an important part of why we will

all get more systems that we can actually own, no subscriptions, these run in our

pockets super fast, mostly for free. Okay, now not even this technique is perfect.

One limitation is that if you put the engram module too deep in the network, it gets less

accurate because the model has already wasted time processing what is being asked. Of course,

there is no need to look up what you already

7:27

computed. I think this is common sense at this

point. Our chef has to check the pantry at the start of the shift. If he checks it after the

food was served, the pantry is completely useless. A really advanced research paper explained

in simple words. We are Fellow Scholars, and that’s what we do here. And we have a growing

club. I’ll continue in a moment, but you know who is also watching us? The one and only Larry

Wheels. Yes. He is one of our OG Fellow Scholars, doing some Scholarly work between two hard sets

of bicep curls in the gym. You think I am kidding? I am not. Link is in the description.

Reading his comment made me instantly more muscular. So much value. Huge respect

to Mr. Wheels! Honored to have you here. And here comes the best part. I think this

will be a part of every major AI system, and it is knowledge out there for free for all

of us, and now you know exactly how it works!

8:33

No nonsense where everything is hidden

in a proprietary system that costs 300 dollars per month to run. Nope. All

free for all of us. Glorious. An epic paper. Now, as our chef does, I took a bit longer

to cook this video. But I promise that I did not put together my computer from scratch

before starting. So I took some more time to make sure you get a better video. If you feel

this is the right way of doing that, subscribe, hit the bell and leave a really kind comment.

And you can also check out Lambda with our link in the description because it is an excellent

way of running DeepSeek privately, I do it too.