DeepSeek V3.2 — benchmark.space

Caleb Writes Code

DeepSeek V3.2

2025-12-03 7min 71,935 views watch on youtube →

Channel: Caleb Writes Code

Date: 2025-12-03

Duration: 7min

Views: 71,935

URL: https://www.youtube.com/watch?v=u0n6wMnEYsk

DeepSeek breaks silence and releases their new v3.2 model. Frontier labs like OpenAI, Google, xAI, and Anthropic has been battling each other and this is the first major release made from China as a response.

How does this model fit in the product and market space and how does this technology impact the rest of the industry when it comes to enterprise and business use cases? Will the US comeback with stronger models?

#ai #llm #technology

Chapters

00:00 Intro

00:26 Consumer vs Enterprise

01:45

If you're watching this, you're probably one of the early adopters in AI, which means for broad mainstream use cases of AI, they're probably not going to feel the ripple effect of model releases like the DeepSeek V3.2. But for early adopters, including enterprise users, model releases like DeepSync V3.2 certainly carry a different impact for so many reasons, which we'll cover in this video. Welcome to Kale Bright's Code, where every second counts. Let's start by separating out the market into two segments. Consumer and enterprise. Clearly, the consumer market is dominated by giants like Chachabt, Copilot, Gemini, Grock, and Perplexity. And people often hop between these apps seasonally or just use many of them simultaneously. And DeepSync holds a very small fraction of this market. And this can be explained simply, which is this. US models are just simply better for most use cases. and they just have way more world knowledge and are more well-rounded overall. So, it's unlikely that the DeepSeek V3.2 will impact the

chatbot market by much. Most people just won't care. Okay, if that's the case, why is this release such a big deal? When you come to the enterprise market, things look slightly different here. Here the use cases aren't just simple chat bots and asking random questions but rather in the enterprise market the primary objective is to automate and replace jobs by creating AI agents and you want to keep your cost as low as possible. And guess what DeepS v3 3.2's major strength is? It's cost. They lowered the cost of intelligence by a huge margin to 28 cents per million input token which is a huge difference compared to other frontier models that we have in the US. Let's briefly talk about the competition between China and the US in this regard. Recently Airbnb picked the Chinese firm Alibaba's Quen model as their primary model for their consumer service agent. So even though on a consumer level Deepseek is still relatively small, enterprise level adoption on Chinese model will continue

to expand for as long as models like DeepSeek V3.2 continue to lower the cost without sacrificing the performance. So to me, Deepseek's new V3.2 model is targeted directly at enterprise level adoption because that's the only way China can get a foothold in the US market. And releasing these models for free and open source is the only way enterprises like Airbnb will host these models internally to keep their costs low while providing their service to other people. As you can see, even though China still doesn't technically earn anything since they're releasing these models for free, it creates two huge momentum in the industry. First, they steal the enterprise market from the US, which prevents frontier labs like Anthropic, OpenAI, and Gemini from making money from enterprise users. And we know that API revenue makes up one of the largest portion of these companies revenue. So, even though China can't directly compete in the consumer market, they are certainly taking away the

ability for US companies to make money from enterprise users that have higher revenue per user or ARPO. And second, the more US companies start to use and rely on Chinese models, the more we start to gain trust in China. Right now, most people in the US simply don't trust China when it comes to privacy. But every successful release that China makes is a small incremental step towards winning the trust in Chinese technology. And they're inching towards gaining full trust from the US. Let's dig a bit deeper to look at some of the technical improvements they made and see what kind of impact those have. Let's start with a simple sentence. Hi, this is Caleb writes, obviously looking at this statement, your first instinct is to go code, which is the right word. In traditional LLMs, the LLMs predict the next word by looking at every preceding tokens to predict the next token. And as you can imagine, this is computationally heavy, especially if your input is extremely long, like dropping an entire

PDF to chache BT to analyze them. That's going to be a lot of computation that's needed because attention is expensive. Okay, so if attention is expensive, what are some of the alternatives? Well, here's what Deep Seek V3.2 did to cut down on the cost of attention. They said, "Hey, maybe we don't really need to look at every single token preceding the sentence. Maybe instead we can preemptively calculate the relevance using a much lower precision module to create an index of every pair and only serve up to top 148 tokens. This way attention is not expensive but attention is cheaper. This kind of selective attention mechanism is what Deepseek calls DSA or DeepSseek sparse attention and techniques like this is how they were able to reduce the cost of inference drastically without degrading performance. Now the actual underlying training methods and methodologies like dense warm-up stage and sparse training stage will require a bit more time for

me to actually study and understand them. But the moral of the story is that Deepseek was able to employ this technique and reduce the cost while maintaining performance. But cutting the cost isn't the only thing that Deepseek focused on. The model also scored gold medal status on the 2025 IMO math competition, which is in a separate video. I covered the complexity of that competition and how impressive it is to actually win gold there. Another technical advancement worth mentioning is what they call thinking in tool use which is very similar to interleaf thinking that we saw in other models like Miniaax M2. And this is again aiming at efficiency of token usage so that the reasoning traces are kept between tool calls so that if the model makes multiple tool calls to gather more information during its thinking stage from your system, it can retain its reasoning traces and effectively have a longer term token efficiency rather than having to rethink or re-reason as it

continues to work through your request. So all of this is really pointing to how agentic use cases are really starting to be baked into base models like V3.2. And for a lot of people, this might seem trivial in terms of differences. But if you zoom out even a year ago and look at some of the models that were released back then and look at some of the models that are released today, we're actually seeing a huge shift in tool usages becoming the de facto in model development, showing how much agentic use cases for LLM is really starting to become the norm. Okay, so what does this all amount to? Does this means China overtook the US? I think for most people who aren't in the early adopter or enterprise crowd will even hear anything about this release. But for hobbyists, coders, and more importantly, enterprises that are trying to automate jobs with cheaper models for agentic use cases, this model is certainly a momentum towards a wider adoption of DeepSeek and Chinese models. But broadly speaking, the US still has a huge advantage compared to China. We just

have better access to higher quality data and higher quality graphics card. And naturally, the US will just create better models like we always have been. While China will continually undercut the market and move their way towards the frontier models.