❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
📝 The #DeepSeek paper is available here:
https://github.com/deepseek-ai/Engram
https://arxiv.org/abs/2601.07372
Larry Wheels comment in this video:
https://www.youtube.com/watch?v=7SM816P5G9s&lc=Ugz7yiDrr_8YD7w8gaN4AaABAg
Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benj
Few people know, but modern AI systems are
really silly. How? Well, imagine having a Michelin-star chef being asked for a simple peanut
butter sandwich. That’s weird, but okay. Now, the chef says, you’ll need to wait just a bit,
because I am going to start planting peanuts, wait six months, harvest, churn some peanut
butter, and then get to work on your bread. That sounds really silly, and that is exactly
what modern AI systems like ChatGPT and Gemini do. When they need to recall a simple
fact, like who Alexander the Great was, something crazy happens. They
go through complex reasoning layers and reconstruct everything from
scratch every single time. That is crazy. Now I have an amazing research paper for you
here from folks at DeepSeek AI, and this is a piece of technology that might underpin most if
not all of the amazing AI systems of the future.
Now every now and then we are going to look at a
figure, but for the rest, I am going to bring my physics simulations and all the goodness we
talk about around here. Apologies for that. Okay, so this is a massive waste of
compute. But why does this happen? Well, standard transformers are a kind of neural
network that is inside nearly all modern AI assistants. And here is the problem: they lack
a simple and cheap way to just look things up. Whatever the question is, the answer is a huge
bunch of dense mathematical calculations. From scratch. Yes, it is literally planting
that peanut when you ask for a sandwich. Now in this work, DeepSeek introduces
Engram. With this, they are giving our tired little chef a pantry. Cutting
edge technology brother! Instead of growing that peanut butter sandwich
from scratch, it now just grabs the ingredients from the pantry. I’ll explain
to you how exactly they did it in a bit.
Now, this makes the AI way more efficient, okay,
I expected that. But, what? Are you seeing what I am seeing? Now this I did not expect at all.
So here comes the surprise. Now hold on to your papers Fellow Scholars, because when taking away
some of the AI's complex reasoning parts, known as mixture of experts, MoE. Then, replacing it with
the pantry actually makes the AI smarter. Lower is better here on the loss curves. And not just a
little, this is significantly smarter. Those dots dipping way down show that this hybrid chef makes
far fewer mistakes than previous techniques. It achieves a perfect balance of active cooking
and just grabbing from the pantry. Genius. But that is not the only surprising thing in this
paper. They also added a way for the AI to check the ingredients before using them. You don’t
want rotting fish in your strawberry jam. To
ensure this, they created a context-aware gating
mechanism. The current context is the dish being cooked. Now here, this is compared against the
retrieved memory, the jar from the pantry. If the jar's contents don’t agree with the dish,
the gate drops to zero, throwing the ingredient away. Bye bye rotting fish! This mechanism lives
right here, inside this jolly little dot product. Now let’s see how it actually performs against
the current systems. I’ll tell you exactly what is going to happen now. What happens in nearly
all research papers with something new. It does something, it is compared to previous
methods, and it’s better at some things, worse at others. And then you sit down and you
do your analysis. Okay, let’s see…wait what? What just happened here? The new engram technique
makes the neural network better…everywhere.
Absolutely everything is measurably
better. This is an absolute miracle work. The engram model is actually better
on every single benchmark compared to the previous techniques. It is better everywhere! Now this is an amazing life lesson too. How? Well, essentially what DeepSeek does is
automates the easy part, and focuses on the more difficult tasks. No wonder it
works so well! What a time to be alive! We can learn so much from these research papers,
and not just about AI, but about life itself. Okay, now I’ll tell you how this works, and
it turns out, there are more surprises ahead. Dear Fellow Scholars, this is Two Minute
Papers with Dr. Károly Zsolnai-Fehér. Okay, so how does it do this magic?
Well, it uses what they call n-gram embeddings combined with multi-head hashing.
Okay, what the heck does that mean? Well, in the kitchen, the chef looks at the order
ticket, sees a 3-word phrase, and instantly
knows exactly which shelf in the pantry has
the premade sauce, and grabs it quickly. And I think this also shows us that there are
simple and basic ideas in AI that we haven’t found yet. I mean, this thing is basically
a look up table. It is as simple as it gets, and it makes everything more efficient and
better across the board. Just think about it: we removed 20 or 25% of the smart
experts in this little virtual brain, put a spreadsheet there, and it
got better! I mean what? Crazy. And I love how we have a little
better understanding of the AI system itself. Usually, no one
knows what is going on inside, but here. Look. When they switched off
the engram memory during testing, the AI’s ability to answer trivia went down 70%. But
its reading comprehension remained at 93%. Why?
Well I think this shows that AI split its brain,
and it’s using the new part just to store facts. Just think about it. When they locked the
pantry door during testing his ability to understand a recipe stayed at a massive
93%! What does that mean? It shows the chef split the work perfectly. He used
the pantry strictly as a storage shelf for memorized ingredients, but he
can still cook an amazing meal. I think this is going to lead to even
cheaper and even smarter AI systems, and this will be an important part of why we will
all get more systems that we can actually own, no subscriptions, these run in our
pockets super fast, mostly for free. Okay, now not even this technique is perfect.
One limitation is that if you put the engram module too deep in the network, it gets less
accurate because the model has already wasted time processing what is being asked. Of course,
there is no need to look up what you already
computed. I think this is common sense at this
point. Our chef has to check the pantry at the start of the shift. If he checks it after the
food was served, the pantry is completely useless. A really advanced research paper explained
in simple words. We are Fellow Scholars, and that’s what we do here. And we have a growing
club. I’ll continue in a moment, but you know who is also watching us? The one and only Larry
Wheels. Yes. He is one of our OG Fellow Scholars, doing some Scholarly work between two hard sets
of bicep curls in the gym. You think I am kidding? I am not. Link is in the description.
Reading his comment made me instantly more muscular. So much value. Huge respect
to Mr. Wheels! Honored to have you here. And here comes the best part. I think this
will be a part of every major AI system, and it is knowledge out there for free for all
of us, and now you know exactly how it works!
No nonsense where everything is hidden
in a proprietary system that costs 300 dollars per month to run. Nope. All
free for all of us. Glorious. An epic paper. Now, as our chef does, I took a bit longer
to cook this video. But I promise that I did not put together my computer from scratch
before starting. So I took some more time to make sure you get a better video. If you feel
this is the right way of doing that, subscribe, hit the bell and leave a really kind comment.
And you can also check out Lambda with our link in the description because it is an excellent
way of running DeepSeek privately, I do it too.