DeepSeek V4 Drops Nvidia: China's CUDA Independence Moment — benchmark.space

Asia Tech Macro

DeepSeek V4 Drops Nvidia: China's CUDA Independence Moment

2026-04-08 18min 141 views watch on youtube →

Channel: Asia Tech Macro

Date: 2026-04-08

Duration: 18min

Views: 141

URL: https://www.youtube.com/watch?v=YSVUHxHID0c

DeepSeek's next flagship AI model — 1 trillion parameters — runs entirely on Huawei chips. ByteDance alone is committing $5.6B for 750,000 units. The CUDA monopoly faces its first real challenge.

In this episode, Alex and Maya break down:

• How the Huawei Ascend 950PR beats Nvidia's H20 by 2.8x on FP4

• The full Chinese AI stack: Ascend chips, CANN software, MindSpore framework

• Why training vs inference matters for the contrarian read

• SMIC's manufacturing limits and the EUV bottleneck

• Wha

$5.6 billion. That's how much Bite Dance alone has reportedly committed to Huawei Ascend chip purchases in 2026. $750,000 units from a single Chinese company. And here's why that number matters. Until very recently, the consensus view was that Frontier AI models simply could not run on Chinese design chips. Nvidia and CUDA were the only game in town. That assumption is about to be tested at unprecedented scale. >> And the testing happens this month. Deepseek V4, the next flagship model from the Chinese AI company that wiped out $600 billion from Nvidia's market cap last year, launches in mid to late April 2026. The model has roughly 1 trillion total parameters with 37 billion active per inference in a mixture of experts architecture. It's twice the scale of Deepseek V3. And

critically, it's been designed from the ground up to run entirely on Huawei Ascend 950 PR processors with no NVIDIA hardware required. >> Welcome back to Asia Tech Macro. I'm Maya and today we're diving into what could be the most consequential AI hardware story of 2026. China's attempt to break free from Nvidia and CUDA dependence. If Deepseek V4 actually delivers frontier level performance on Huawei silicon, the entire US export control strategy on AI chips needs to be reconsidered and the broader AI hardware landscape which has been dominated by Nvidia for two decades faces its first serious structural challenge. >> Let me set the stage with the chip itself. The Huawei Ascend 950 PR, sometimes called the Ascend 950 PR, was unveiled in March 2026 alongside the Atlas 350 accelerator card built around it. The headline specs are remarkable. 1.56 pedaflops of compute power, 112 GB of HBM memory, 1.4 tab per second memory bandwidth, 600 W power consumption.

>> How does that compare to what Nvidia ships in China? >> This is where it gets interesting. Nvidia's H20, which is the export-restricted version of the H100 designed specifically for the Chinese market, is the closest direct competitor. The Ascend 950 PR delivers 2.8 times the FP4 performance of the H20. It has 16% more HPM capacity, and multimodal generation speed is up to 60% faster. So, on the metrics that matter most for AI inference workloads, Huawei's chip is actually beating Nvidia's China export offering, not just matching it. >> That's a remarkable claim. Are we sure about those numbers? >> They come from Huawei's own benchmarks, so there's natural skepticism. But Trend Force and Digi Times have both confirmed the spec sheets independently, and the buying behavior tells the story. Bite Dance committing $5.6 billion for $750,000 units isn't a vanity purchase. They've done their own testing. >> Let me ask the question everyone's wondering. How is Huawei manufacturing these chips? They don't have access to TSMC's leading edge nodes. Right. >> Right. And this is the most fascinating

part of the story. The Ascend 950PR is being manufactured by ESMIC, China's leading domestic foundry on their N plus3 process. That's a 7nanmter derived process that delivers performance roughly comparable to 5nanmter class manufacturing. ESIC has been improving their yields and capacity steadily despite being cut off from EUV lithography by US export controls. The fact that they can produce a chip with these specs is itself a major achievement. So we have a Chinese-esigned AI chip manufactured by a Chinese foundry running a Chinese AI model used by Chinese hyperscalers. The entire stack is now domestic. >> Almost. The HPM memory is still partially imported from Korean suppliers like SKHEX and Samsung. Though Huawei has been developing their own HBM technology called High 1.0. The Ascend 950PR uses 112 GB of proprietary H high 1.0 memory in some configurations. So, they're working toward full vertical integration, but they're not quite there yet. >> And the timing is fascinating. Just three months ago, the conventional wisdom was that China would be at least

18 months behind on AI hardware. Now, we're talking about Huawei chips that beat Nvidia's China export offering. What changed? >> Two things accelerated dramatically. First, the ESM manufacturing breakthrough. They cracked yields on N plus3 process technology faster than anyone outside China expected. Second, Huawei's chip design team has been on an aggressive iteration cadence. The Ascend 910 launched in 2019, the 920 in 2022, the 950 series in 2026. That's roughly a new generation every 2 to 3 years, which matches Nvidia's pace despite manufacturing constraints. >> And the demand from Chinese tech companies has been extraordinary, right? >> Reports suggest Alibaba, Bite Dance, and Tencent have collectively ordered hundreds of thousands of units of the Ascend 950 PR. The surge in demand has pushed Huawei's chip prices up by approximately 20%. Bite Dance alone is reportedly committing $5.6 billion and planning 750,000 units throughout 2026. >> Let's talk about why Deepseek V4 specifically is such a big deal because

the chip is one part of the story, but the model running on it is what makes this a turning point. >> Exactly. Deepseek's V3 model already shocked the world last year by demonstrating GPT4 class performance at roughly 127th the inference cost of OpenAI's offerings. V4 takes that to another level. The total parameter count is around 1 trillion with 37 billion active per token in a mixture of experts architecture. The context window is reportedly 1 million tokens and it's natively multimodal handling text, image, and video generation in a single model. >> How does it benchmark against Western Frontier models? >> Internal tests reportedly show V4 achieving over 80% on SWEBench, which is the standard benchmark for software engineering coding tasks. That puts it in the same range as GPT5 and Claude 4.5. And it does it at 10 to 40 times lower inference costs than Western alternatives. >> So if those numbers are real, you have a model that matches Frontier performance

at a tiny fraction of the cost running on chips that don't depend on Nvidia. >> That's the thesis. And here's the deeper point. Deepseek didn't just port V3 to Chinese hardware. They went back to the model architecture and rewrote core pieces specifically to optimize for Ascend processors. The team worked closely with Huawei and Camberkin engineers to make these adjustments. This is the first Frontier AI model that was designed natively for Chinese silicon. >> That's a fundamental shift from how Chinese AI development worked before. Until now, models were trained on Nvidia hardware and then sometimes ported to Chinese chips for inference. The training and the architecture itself were western dependent. >> Right? And this matters because the dominance of Nvidia and AI isn't just about the chips. It's about the entire software stack. CUDA is Nvidia's proprietary programming environment that has been the de facto standard for GPU computing for almost two decades. PyTorch, TensorFlow, Jax, every major ML framework was built assuming CUDA

underneath. That stack is what makes it so hard to switch away from Nvidia even when you have a competitive chip. >> And Huawei's answer to CUDA is CAN, right? CAN which stands for compute architecture for neural networks is Huawei's proprietary programming environment for ascend chips. It sits at the same level as CUDA in the software stack. Above that they have Mindpor which is their alternative to PyTorch. The combination of Mindpor plus CAN gives Chinese developers a complete AI software environment that doesn't touch Nvidia at any level. >> And Huawei made CAN open source last year which was a big deal >> right and it was a calculated move by open sourcing CAN. Huawei is trying to attract developers globally and reduce the dependence on NVIDIA's CUDA ecosystem. The challenge is that two decades of CUDA development have created a massive moat. Almost every machine learning developer in the world has experience with CUDA. Almost none have CAN experience. That's the hardest barrier for Huawei to overcome. >> But Deepseek V4 might actually be the catalyst, right? If the most popular open-source AI model in the world is

optimized for CAN and Ascend, that creates a forcing function for developers to learn the alternative stack. >> Exactly. And there's a precedent here. When Tensor first launched in 2015, almost nobody used it. Then Google made it the default for their products and provided extensive documentation and tutorials. Within a few years, TensorFlow had millions of users. Software ecosystems can grow surprisingly fast when there's a strong anchor application driving adoption. Deepseek V4 could be that anchor for Ascend and can. >> Let me ask about the infrastructure side. Huawei has also been building these massive interconnected systems, right? >> That's the Atlas 950 Super Pod, which Huawei plans to launch in 2026. It links 8,192 Ascend chips together to deliver eight exoflops of FP8 performance. That's backed by 1,152 tab of memory and 16.3 pabytes per second of interconnect bandwidth. To put that in perspective, that's roughly the scale of the largest training clusters that Microsoft and Google are building, but with entirely Chinese- designed and Chinese manufactured infrastructure.

>> So, Huawei is essentially building Chinese alternatives to every layer of the NVIDIA AI stack, the chips, the interconnect, the software framework, the training infrastructure, >> and the road map is even more aggressive going forward. Huawei has publicly stated targets for the Ascend 960 in 2027, which they're aiming to be a performance parody with Nvidia's Blackwell architecture. The Ascend 970 projected for 2028 targets par with or exceeding Nvidia's Reuben Plus architecture. So they're not just trying to catch up, they're trying to leaprog into competitive par within 2 years. >> That's an incredibly ambitious road map. Whether they can actually deliver on it depends on the manufacturing side. Right. >> Exactly. Chip design improvements happen faster than manufacturing improvements. The constraint isn't whether Huawei can architect a chip that competes with Blackwell. It's whether SMIC can manufacture it at the necessary precision and yield. That's the hidden bottleneck that determines whether the 2027 and 2028 targets are realistic. >> All right, contrarian corner. The consensus view is forming pretty quickly that this is China's CUDA independence

moment. Deepseek V4 plus Ascend 950 PR equals the end of Nvidia's monopoly in China. Alex, what's the skeptical read? >> Here's my contrarian thesis. The Deepseek V4 launch is a real milestone, but the narrative of imminent Nvidia displacement is dramatically overstated. Three reasons why. >> Go for it. >> First, the chip level performance numbers come from Huawei's own benchmarks, and they're measuring narrow workloads. The 2.8x H20 performance figure is specifically for FP4 inference workloads. That's a very specific use case. for training large models, which is what defines Frontier AI development, the picture is much less favorable for Huawei. Nvidia's H100 and B100 still dominate training workloads by significant margins, and Huawei's chips simply can't match them on training efficiency. >> So, Huawei is competitive on inference, but not training. >> That's the more accurate picture. And here's why it matters. Frontier AI development requires massive training runs that can take weeks or months on

tens of thousands of GPUs. The companies pushing the state-of-the-art OpenAI, Anthropic, Google DeepMind, are doing this on NVIDIA clusters. Even DeepSeek V4 was likely trained on a mix of NVIDIA and Huawei hardware with the Huawei portion being primarily for the final inference deployment. >> That's a critical distinction. Training versus inference are very different workloads with different chip requirements. >> Right. And here's the second contrarian point. The CUDA software mode is much harder to displace than the headline numbers suggest. Yes, Huawei has CAN. Yes, Mindpor exists, but the practical reality is that almost every machine learning engineer in the world has years of CUDA experience and zero CAN experience. Every tutorial, every Stack Overflow answer, every library is built around CUDA. >> The ecosystem effect is real. >> It's enormous. Even if CAN became technically superior to CUDA tomorrow, it would take 5 to 10 years for the developer ecosystem to migrate. That's how long these things take. Microsoft's Azure had to create CUDA compatible alternatives just to attract developers.

Even though Azure had massive resources. >> What's the third reason? >> Third, and this is the strategic concern most analysts are missing. China's AI independence push has a structural ceiling that Huawei can't break through alone. SMIC, the foundry making the Ascend 950 PR, is still using N plus3 process technology that's roughly equivalent to 5nmter class. TSMC is already mass-producing 2nm. Samsung is at 2nm. By 2028, Leading Edge will be at 1.4 Nm. SMIC is 2 to three generations behind. And without EUV lithography, SMIC can't catch up easily. >> Exactly. ASML's EUV machines are restricted from export to China under the Wasinar arrangement and US Dutch agreements. Without EUV, SMIC can produce competitive chips at 7 nm and even 5nmter with multiatterning. But the economics get prohibitive at smaller nodes. So even if Huawei designs better architectures, the manufacturing constraint becomes the binding limit. But devil's advocate, China is investing massively in indigenous lithography, SME, Shanghai micro electronics equipment is reportedly working on its

own EUV alternative. Wouldn't that close the gap eventually? >> Possibly, but the timeline is much longer than the AI race demands. ASML took two decades and 30 billion to develop EUV. China is starting from a much lower base and faces export restrictions on key components. Even optimistic estimates put indigenous Chinese EUV at least 5 to 8 years away. By then the global AI comput landscape will look completely different. >> So your contrarian view is that DeepS v4 plus Huawei Ascend is real progress but not a transformational breakthrough. >> It's transformational in its symbolic significance. China has demonstrated that they can build the entire stack domestically. That matters for national security for industrial policy for negotiating leverage with the US. But in terms of actual AI capability, the gap with Nvidia and TSMC remains substantial. The risk is that Western policy makers and investors will overreact to the narrative and miss what's actually happening. The narrative versus the reality. >> Exactly. The narrative is that China achieved AI chip independence. The reality is that China achieved AI chip viability for a specific subset of workloads while remaining structurally dependent on US control technology for

the leading edge. >> Action items time. What should investors and business professionals do with all of this? >> Three concrete actions. First, watch the actual benchmark results when Deepseek V4 launches in mid to late April. The internal numbers from DeepSeek and Huawei are promising, but the real test happens when independent researchers and developers get their hands on the model and run their own benchmarks. The first 30 days post launch will tell us whether the hype matches the reality. >> And what should we look for in those benchmarks? >> The key tests are S SW bench for coding, MMLU and GPQA for knowledge and reasoning, and large multimodal benchmarks for image and video generation. If V4 hits 80% or higher on S SW bench, it's genuinely competitive with Frontier Western models. If it falls below 70%, the narrative weakens significantly. Also, watch the reported inference costs. If they actually achieve the 10 to 40x cost advantage they're claiming, that's a structural disruption to the economics of AI deployment. >> Got it. What's the second action? >> Second, monitor Nvidia's response carefully. Jensen Hong has repeatedly

said that competing with DeepSeek and Chinese AI Labs is part of Nvidia's strategy, not a threat to it. But Nvidia's H20, the export-restricted version for China, is being directly outperformed by Huawei's Ascend 950 PR. That creates real pressure on Nvidia to either restore access to higher performance chips or accept that the Chinese market is shrinking. Watch for any policy changes around H200 or B100 export approvals. And the policy angle is critical because the Trump administration has been signaling more openness to selective chip exports. >> Right. The January 2026 Bureau of Industry and Security rule change shifted H200 license reviews from presumption of denial to casebyase review. If we see actual H200 approvals to Chinese hyperscalers later this year, that's a sign that policymakers are recalibrating in response to Chinese semiconductor progress. The logic would be if China can build competitive chips anyway, restricting exports just gives Huawei a captive market. That's a fascinating policy dynamic. Restrictions creating the very thing they were meant to prevent. >> Exactly. Third action. Watch Smick's

quarterly capacity announcements very carefully. The bottleneck for Huawei's Ascend lineup isn't design, it's manufacturing capacity. Smick's ability to deliver 750,000 Ascend 950 PR chips to Bite Dance alone, plus large orders from Alibaba and Tencent, plus Deepseek's own infrastructure needs, is going to test their production capability. If Smix can scale, the Huawei AI ecosystem grows. If they hit yield problems or capacity constraints, the entire Chinese AI comput story slows down dramatically. >> And Smick reports quarterly, so we'll have visibility into this, >> right? And there's one more dimension. Watch for international adoption of Huawei's chips and CAN software outside China. If CAN starts getting traction in Southeast Asia, the Middle East, or Africa, that's a sign that the alternative AI stack is becoming globally viable. If it stays purely a domestic Chinese ecosystem, the long-term competitive impact on Nvidia is much more limited. >> Let me add one more dimension. the geopolitical implications. If Deepseek V4 succeeds, it changes China's leverage in any AI related diplomatic negotiations. The Trump Summit is now

scheduled for May 14th in Beijing. Going into that summit, China can credibly claim they don't need American AI chips anymore. That's a fundamentally different negotiating position than they had even 6 months ago. And the export control regime that the Biden and Trump administrations built becomes less relevant if the Chinese can route around it. the whole semiconductor export control architecture was designed to slow Chinese AI development. If it's now accelerating Chinese AI independence instead, that's a strategic failure that needs to be reckoned with. >> And if you want all the data points, model benchmarks, and policy analysis we discussed today, we publish a free daily newsletter at ascmro.substack.com. We'll cover the V4 launch in detail when it happens. >> Coming up next, we're looking at Japan's record-breaking inbound tourism numbers and the structural shift it's creating in the Japanese economy. 42.7 million visitors in 2025 with February 2026 alone hitting 3.4 million for the highest single month ever. We'll dig into the macro implications. >> That'll be a fun one. >> Stay data driven. Bye.