Dylan Patel, founder of SemiAnalysis, provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power.
And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers.
Learned a ton about every single level of the stack. Enjoy!
𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒
* Transcript: https://www.dwarkesh.com/p/dylan-patel
* Apple Podcasts: https://podcasts.apple.com/us/podcast/dylan-patel-deep-dive-on-the-3-big-bottlenecks-to/id1516093381?i=1000755126873
*
All right, this is the episode where
my roommate teaches me semiconductors. It's also the send off for this current set.
It is. After you use it, I'm like, "I can't use this again.
I gotta get out of here." No sloppy seconds for Dwarkesh.
Dylan is the CEO of SemiAnalysis. Dylan, here’s the burning question I have for you.
If you add up the big four—Amazon, Meta, Google, Microsoft—their combined forecasted CapEx this
year that you published recently is $600 billion. Given yearly prices of renting that compute,
that would be close to 50 gigawatts. Obviously, we're not putting
on 50 gigawatts this year, so presumably that's paying for compute that is
going to be coming online over the coming years. How should we think about the timeline around
when that CapEx comes online? Similar question for the labs. OpenAI just announced they
raised $110 billion, and Anthropic just announced they raised $30 billion.
If you look at the compute they
have coming online this year—you should
tell me how much it is, but is it on the order of another four gigawatts total?
The cost to rent the compute that OpenAI and Anthropic will have this year to sustain their
compute spend is $10 to $13 billion a gigawatt. Those individual raises alone are enough
to cover their compute spend for the year. And this is not even including the revenue
that they're going to earn this year. So help me understand: first,
what is the timescale at which the Big Tech CapEx actually comes online?
And second, what are the labs raising all this money for if the yearly price of a
one-gigawatt data center is $13 billion? So when you talk about the CapEx of these
hyperscalers being on the order of $600 billion, and you look across the rest of the supply chain,
it gets you to the order of a trillion dollars. A portion of this is immediately for compute
going online this year: the chips and the other parts of CapEx that get paid this year.
But there's a lot of setup CapEx as well.
When we're talking about 20 gigawatts of
incremental added capacity this year in America, a portion of this is not spent this year.
A portion of that CapEx was actually spent the prior year.
When you look at Google having $180 billion, a big chunk of that is
spent on turbine deposits for '28 and '29. A chunk of that is spent on data
center construction for '27. A chunk of that is spent on power purchasing
agreements, down payments, and all these other things they're doing further out into the future
so they can set up this super fast scaling. This applies to all the hyperscalers
and other people in the supply chain. So with roughly 20 gigawatts deployed this year,
a big chunk is hyperscalers, and a chunk is not. For all of these companies, their biggest
customers are Anthropic and OpenAI. Anthropic and OpenAI are at roughly
two to two-and-a-half gigawatts right now, and they're trying to scale much larger.
If you look at what Anthropic has done over the
last few months, with $4 billion
or $6 billion in revenue added, we can just draw a straight line and say they'll
add another $6 billion of revenue a month. People would argue that’s bearish,
and that they should go faster. What that implies is they're going to add $60
billion of revenue across the next ten months. At the current gross margins Anthropic had,
as last reported by media, that would imply they have roughly $40 billion of compute spend for
that inference, for that $60 billion of revenue. That $40 billion of compute, at roughly
$10 billion a gigawatt in rental costs, means they need to add four gigawatts of
inference capacity just to grow revenue. That’s assuming their research and
development training fleet stays flat. In a sense, Anthropic needs to get to well
above five gigawatts by the end of this year. It's going to be really tough for
them to get there, but it's possible. Can I ask a question about that?
If Anthropic was not on track to have five gigawatts by the end of this year, but it
needs that to serve both the revenue that's gone
crazier than expected—and maybe it's going to be
even more than that—plus the research and training to make sure its models are good enough for next
year: Where is that capacity going to come from? Dario, when he was on your
podcast, was very conservative. He said, "I'm not going to go crazy on
compute because if my revenue inflects at a different rate, at a different
point… I don't want to go bankrupt. I want to make sure that we're being
responsible with this scaling." But in reality, he's screwed the pooch
compared to OpenAI, whose approach was, "Let's just sign these crazy fucking deals."
OpenAI has got way more access to compute than Anthropic by the end of the year.
What does Anthropic have to do to get the compute? They have to go to lower-quality providers
that they would not have gone to before. Anthropic historically had the best
quality providers, like Google and Amazon, the biggest companies in the world.
Now Microsoft is expanding across the supply chain, and they're going to other newer players.
OpenAI has been a bit more aggressive on going to many players.
Yes, they have tons of capacity from Microsoft,
Google, and Amazon, but they also
have tons with CoreWeave and Oracle. They've gone to random companies, or companies
one would think are random, like SoftBank Energy, who has never built a data center in their life
but is building data centers now for OpenAI. They've gone to many others,
like NScale, to get capacity. There's this conundrum for Anthropic because
they were so conservative on compute, because they didn't want to go crazy.
In some sense, a lot of the financial freakouts in the second half of last year
were because, "OpenAI signed all these deals but they didn't have the money to pay
for them…" Okay, Oracle's stock is going to tank, CoreWeave's stock is going to tank.
All these companies' stocks tanked, and credit markets went crazy because people
thought the end buyer couldn't pay for this. Now it's like, "Oh wait,
they raised a ton of money. Okay, fine, they can pay for it."
Anthropic was a lot more conservative. They were like, "We'll sign
contracts, but we'll be principled. We'll purposely undershoot what we think we
can possibly do and be conservative because we don't want to potentially go bankrupt."
The thing I want to understand is, what does
it mean to have to acquire compute in a pinch?
Is it that you have to go with neoclouds? Do they have worse compute? In what way is it worse?
Did you have to pay gross margins to a cloud provider that you wouldn't have otherwise had to
pay because they're coming in at the last minute? Who built the spare capacity such
that it's available for Anthropic and OpenAI to get last minute?
What is the concrete advantage that OpenAI has gotten if they end up
at similar compute numbers by 2027? Are they just going to end this
year with different gigawatts? If so, how many gigawatts are Anthropic and
OpenAI going to have by the end of this year? To acquire excess compute, yes,
there is capacity at hyperscalers. Not all contracts for compute
are long-term, five-year deals. There's compute from 2023 or 2024, or H100s
from 2025, that were signed at shorter terms. The vast majority of OpenAI's compute is signed
on five-year deals, but there were many other customers that had one-year, two-year,
three-year, or six-month deals, on demand. As these contracts roll off,
who is the participant in the
market most willing to pay price?
In this sense, we've seen H100 prices inflect a lot and go up.
People are willing to sign long-term deals for above $2 even.
I've seen deals where certain AI labs—I'm being a little bit vague here for a reason—have signed at
as high as $2.40 for two to three years for H100s. If you think about the margin, it costs
$1.40 to build Hopper, across five years. Now, two years in, you're signing deals for two
to three years at $2.40? Those margins are way higher. Now you can crowd out all of these other
suppliers, whether Amazon had these, or CoreWeave, or Together AI, or Nebius, or whoever it is.
These neoclouds are the firms that had a higher percentage of Hopper in general
because they were more aggressive on it. They also tended to sign shorter-term
deals, not CoreWeave but the others.
So if I want Hopper, there
is some capacity out there. Also, while most of the capacity at an Oracle
or a CoreWeave is signed for a long-term deal in terms of Blackwell, anything that's
going online this quarter is already sold. In some cases, they're not even hitting all the
numbers they promised they would sell because there are some data center delays, not just those
two, but Nebius, Microsoft, Amazon, and Google. But there are a lot of neoclouds, as well as some
of the hyperscalers, who have capacity they're building that they haven't sold yet, or capacity
they were going to allocate to some internal use that is not necessarily super AGI-focused,
that they may now turn around and sell. Or in the case of Anthropic, they don't
have to have all the compute directly. Amazon can have the compute and serve Bedrock,
or Google can have the compute and serve Vertex, or Microsoft can have the compute
and serve Foundry, and then do a revenue share with Anthropic, or vice versa.
Basically, you're saying Anthropic is having to pay either this 50% markup in the sense of the
revenue share, or in the sense of last-minute spot compute that they wouldn't have otherwise
had to pay had they bought the compute early.
Right, there's a trade-off
there. But at the same time, for a solid four months, everyone was saying to
OpenAI, "We're not going to sign deals with you." That sounds crazy, but it was
because, "you don’t have the money." Now everyone's saying, "OpenAI,
we believed you the whole time. We can sign any deal because
you've raised all this money." Anthropic is constrained in that sense.
There are not that many incremental buyers of compute yet, because Anthropic hit the capability
tier first where their revenue is mooning. That's interesting. Otherwise you
might think having the best model is an extremely depreciating asset, because three
months later you don't have the best model. But the reason it's important is that you
can sign these deals, lock in the compute in advance, and get better prices.
Maybe this is an obvious point. But at least until recently, people had made this
huge point about the depreciation cycle of a GPU. The bears, the Michael Burrys or
whoever, have said, "Look, people
are saying four or five years for these GPUs.
Maybe it's because the technology is improving so fast, but it in fact makes sense to have
two-year depreciation cycles for these GPUs," which increases the reported amortized CapEx
in a given year and makes it financially less lucrative to build all these clouds.
But in fact you’re pointing out that maybe the depreciation cycle is even longer than five years.
If we're using Hoppers—especially if AI really takes off and in 2030 we’re saying, "We have
to get the seven-nanometer fabs up, we have to go back and turn on the A100s again"—then the
depreciation cycle is actually incredibly long. I feel like that's an interesting financial
implication of what you're saying. There's a few strings to pull on there.
One is, what happens to depreciation of GPUs? I guess I didn't answer your prior question,
which is that I think Anthropic will be able to
get to five gigawatts-ish, maybe a little
bit more by the end of the year through themselves as well as their product being
served through Bedrock, Vertex, or Foundry. I think they'll be able to get to five or six
gigawatts, which is way above their initial plans. OpenAI will be roughly the same, actually
a little bit higher based on our numbers. But anyway, the depreciation cycle of a GPU.
Michael Burry was saying it's three years or less. That’s sort of his argument.
There are two lenses to look at this. Mechanically, there's a TCO model, total cost of
ownership of a GPU, where we project pricing out for GPUs and build up the total cost of a cluster.
There are a number of costs: your data center cost, your networking cost, your smart hands and
people in the data center swapping stuff out. There's your spare parts, your
actual chip cost, your server cost. All these various costs get lumped together.
There's some depreciation cycles on it,
certain credit costs on it.
You build up to, "Hey, an H100 costs $1.40/hour to deploy at volume across five
years if your depreciation is five years." If you sign a deal at $2/hour for those five
years, your gross margin is roughly 35%. It's a little bit above that.
If you sign it for $1.90, it's 35% roughly. Then you assume at that fifth year,
the GPU falls off a bus and is dead. In some cases, the argument people are making is
if you didn't sign a long-term deal, because every two years NVIDIA is tripling or quadrupling the
performance while only 2X-ing or 50% increasing the price… Then the price of an H100… Sure
maybe the value in the market was $2 at 35% gross margins in 2024, but in 2026, when Blackwell
is in super high volume and deploying millions a year, you’re actually now worth $1/hour.
And when Rubin in '27 is in super high volume—even though it starts shipping this year, it’s super
high volume next year—doing millions of chips a
year deployed into clouds, you've got another
3X in performance, another 50% or 2X in price, then the Hopper is only worth $0.70/hour.
So the price of a GPU would continue to fall. That's one lens. The other lens is,
what is the utility you get out of the chip? If you could build infinite Rubin
or infinite of the newest chip, then yes, that's exactly what would happen.
The price of a Hopper would fall at a spot or short-term contract rate as the new chips
come out and the price per performance goes up. But because you are so limited on semiconductors
and deployment timelines, what actually prices these chips is not the comparative thing I
can buy today, but rather what is the value I can derive out of this chip today.
In that sense, let's take GPT-5.4. GPT-5.4 is both way cheaper to run than
GPT-4 and has fewer active parameters. It's much smaller, in that sense
of active parameter, because it's a
sparser MoE versus GPT-4 being a coarser MoE.
There's also been so many other advancements in training, RL, model architecture, and data
qualities that have made GPT-5.4 way better than GPT-4. And it's cheaper to serve. When you look
at an H100, it can serve more tokens per GPU of 5.4 than if you had ran GPT-4 on it.
So it's producing more tokens of a model that is of higher quality.
What is the maximum TAM for GPT-4 tokens? Maybe it was a few billion dollars, maybe
it was tens of billions of dollars. Adoption takes time. For GPT-5.4, that number
is probably north of a hundred billion. But there's an adoption lag, there's
competition, and there's the constant improvements that everyone else is having.
If improvements stopped here, the value of an H100 is now predicated on the value
that GPT-5.4 can get out of it instead of the value that GPT-4 can get out of it.
These labs are in a competitive environment, so their margins can't go to infinity.
You sort of have this dynamic that is
quite interesting in that an H100 is worth
more today than it was three years ago. That's crazy. It's also interesting from
the perspective of just taking that forward. If we had actual AGI models developed, if
we had a genuine human on a server… These are such hand wave-y numbers about
how many flops the brain can do. But on a flop basis, an H100 is estimated to
do 1e15, which is how much some people estimate the human brain does in flops.
Obviously, in terms of memory, the human brain has way more.
An H100 is 80 gigabytes, and the brain might have petabytes.
Oh, yeah, you've got petabytes? Name a petabyte of ones and zeros, bro. Name me a string.
Well, this is actually the point. No, we’ve just got the best
sparse attention techniques ever. Genuinely though. In the amount of information
that is compressed, it might be petabytes. The brain is an extremely sparse MoE.
But anyways, imagine a human knowledge
worker can produce six figures a year of value.
If an H100 can produce something close to that, if we had actual humans on a server, the
value of an H100 is such that it can repay itself in the course of a couple of months.
So when I interviewed Dario, the point I was
trying to make is not that I think the singularity
is two years away and therefore Dario desperately needs to buy more compute, although the revenue is
certainly there that he needs to buy more compute. The point I was trying to make is that given what
Dario seems to be saying—given his statements that we're two years away from a data center of
geniuses, and certainly not more than five years away, and a data center of geniuses should
be earning trillions upon trillions of dollars of revenue—it just does not make sense why
he keeps making these statements about being more conservative on compute or, to your point,
being less aggressive than OpenAI on compute. I guess that point got lost because then people
were roasting me, saying, "Oh, this podcaster is trying to convince this multi-hundred
billion dollar company CEO to YOLO it, bro." I was just trying to say that internally,
his statements are inconsistent. Anyway, it's good to iron it out.
I think going back to the earlier view that if the models are so powerful, the
value of a GPU goes up over time, right now
only OpenAI and Anthropic have that viewpoint.
But as we approach further out, everyone is going to be able to see that value skyrocket per GPU.
So in that sense, you should commit now to compute.
Interestingly, in Anthropic fashion, there's a bit of a meme that they have
commitment issues and are sort of polyamorous. Not Dario, but this is a bit of a meme.
Explains everything. By the way, there's this interesting economic effect called
Alchian-Allen, which is the idea that if you increase the fixed cost of different goods,
one of which is higher quality and one which is lower quality, that will make people choose
the higher quality good, on the margin. To give a specific example, suppose the
better-tasting apple costs two dollars and the shittier apple costs one dollar.
Now suppose you put an import tariff on them.
Now it's $3 versus $2 for a great
apple versus a medium apple. Is that because they both increased by a
dollar, or should it be a 50% increase? No, because they both increased by $1.
The whole effect is that if there's a fixed cost that is applied to both.
Then the price difference between them, the ratio, changes.
Previously, the more expensive one was 2X more expensive.
Now it's just 1.5X more expensive. So I wonder if applied to AI that would mean
that, if GPUs are going to get more expensive, there will be a fixed cost
increase in the price of compute. As a result, that will push people to be willing
to pay higher margins for slightly better models. Because the calculus is, I'm going to be
paying all this money for the compute anyway. I might as well just pay slightly more to
make sure it's the very best model rather than a model that's slightly worse.
So the Hopper went from $2 to $3. If a Hopper can make a million tokens of Opus
and it can make two million tokens of Sonnet,
the price differential between Opus and
Sonnet has decreased because the price of the GPU has increased by a dollar from $2 to $3.
Interesting. I think that makes a ton of sense. We just see all of the volumes
are on the best models today, all the revenue is on the best models today.
In a compute-limited world, two things happen. One, companies that don't have commitment issues
and have these five-year contracts for compute have locked in a humongous margin advantage.
They've locked in compute for five years at the price it transacted at
two, three, or five years ago. Whereas if you're three years into that
five-year contract and someone else's two-year or three-year contract rolled off, and
now they're trying to buy that at modern pricing, when it's priced to the value of models,
the price is going to be up a lot more. So the person who committed early
has better margins in general. The percentage of the market that is in long-term
contracts is much larger than the percentage of
the market in short-term contracts that can be
this flex capacity you add at the last second. At the same time, where does the margin go?
Because models get more valuable, how much can the cloud players flex their pricing? If you look at CoreWeave, their average
term duration is over three years right now. For ninety-eight percent plus of
their compute, it's over three years. They end up with this conundrum
where they can't actually flex price. But every year they're adding incrementally
way more capacity than they had previously. This year alone, Meta's adding as much capacity as
they had in their entire fleet of compute and data centers for all purposes for serving WhatsApp,
Instagram, and Facebook in 2022, and doing AI. They're adding that alone this year.
In the same sense, you talk about Meta doing that, CoreWeave, Google, and Amazon, all these companies
are adding insane amounts of compute year on year. That new compute gets transacted at the new price.
In a sense, yes, you've locked in, as long as
we're in a takeoff. "Oh, OpenAI went from six
hundred megawatts to two gigawatts last year, and from two gigawatts to six plus this
year, and six to twelve next year." The incremental added compute is where all the
cost is, not the prior long-term contracts. Then who holds the cards is the
infra providers for charging margin. Now the cloud players, the neoclouds, or
the hyperscalers can charge the margin. They can to some extent, but then as you go
upstream to who has access to all the memory and logic capacity, it's Nvidia for the most part.
They've signed a lot of long-term contracts. They've got ninety billion dollars of long-term
contracts today, and they're negotiating three-year deals today with the memory vendors.
You've got Amazon and Google through Broadcom, Amazon directly, and AMD.
These companies hold all the cards because they've secured the capacity.
TSMC is not raising prices, but memory vendors are, to some extent, raising a lot of price.
They're going to double or triple price again, but then they're also signing these long-term deals.
Who is able to accrue all the margin dollars is
potentially the cloud, potentially the
chip vendors, and the memory vendors, until TSMC or ASML break out and say,
"No, we're going to charge a lot more." But at the same time, do the model
vendors get to charge crazy margins? At least this year, we're going to see
margins for the model vendors go up a lot. Because they're so capacity constrained,
they have to destroy demand. There's no way Anthropic can continue at
the current pace without destroying demand. Let's get into logic and memory.
How specifically has Nvidia been able to lock up so much of both?
I think according to your numbers, by '27, Nvidia is going to have +70% of
N3 wafer capacity, or around that area. I forget what the numbers were for memory
at SK Hynix and Samsung and so forth. Think about how the neocloud business
works and how Nvidia works with that, or how the RL environment business
works and how Anthropic works with that.
In both those cases, Nvidia is purposely trying to
fracture the complementary industry to make sure that they have as much leverage as possible.
They're giving allocation to random neoclouds to make sure that there's not one
person that has all the compute. Similarly, Anthropic or OpenAI, when they're
working with the data providers, they say, "No, we're going to just seed a huge industry
of these things so that we're not locked into any one supplier for data environments."
And I wonder why on the 3 nm process—that's going to be Trainium 3, that's going to be
TPU v7, other accelerators potentially—why is TSMC just giving it all up to Nvidia
rather than trying to fracture the market? There are a couple points here.
On 3 nm, if we go back to last year, the vast majority of 3 nm was Apple.
Apple is being moved to 2 nm. Memory prices are going up, so
Apple's volumes may go down. As memory prices go up, either
they cut margin or they move on.
There's some time lag because they have
long-term contracts, but Apple likely reduces demand or moves to 2 nm faster, where
2 nm is only capable of mobile chips today. In the future, AI chips will move there.
So Apple has that. Apple is also talking to third-party vendors because they're
getting squeezed out of TSMC a little bit. TSMC's margins on high-performance computing—HPC,
AI chips, et cetera—are higher than they are for mobile, because they have a bigger
advantage in HPC than they do in mobile. When you look at TSMC’s running calculus
here, they're actually providing really good allocations to companies that are doing CPUs.
When you think about Amazon having Trainium and Graviton, both of those are on 3 nm, Graviton
being their CPU, Trainium being their AI chip. TSMC is much more excited to give
allocation to Graviton than they are to Trainium because they view the CPU
business as more stable, long-term growth.
As a company that is conservative and doesn't
want to ride cycles of growth too hard, you actually want to allocate to the market that
is more stable with a lower growth rate first before you allocate all the incremental capacity
to the fast growth rate market. That is the case generally. Same for AMD. The allocations they
get on their CPUs, TSMC is much more excited about those than they are for GPUs. Likewise
for Amazon. Nvidia is a bit unique because yes, they have CPUs, they make switches, they make
networking, NVLink, InfiniBand, Ethernet, NICs. By and large, most of these things will
be on 3 nm by the end of this year with the Rubin launch and all the chips in that
family, the GPU being the most important one. Yet Nvidia is getting the majority of supply.
Part of this is because you look at the market and TSMC and others forecast market demand in
many ways, but it's also the market signal.
The market signaled, "Hey, we need this much
capacity next year. We need this much. We'll sign non-cancelable, non-returnable. We
may even pay deposits." Nvidia just did it way earlier than Google or Amazon.
In some cases, Google and Amazon had stumbling blocks.
One of the chips got delayed slightly by a couple quarters.
Trainium and all these sorts of things happened. In that case, there was a huge sort
of, "Well, these guys are delaying, but Nvidia is wanting more, more, more, more.
And we are checking with the rest of the supply chain, is there enough capacity?"
They're going to all the PCB vendors and saying, "Is there enough PCB?"
Victory Giant is one of the largest suppliers of PCBs to Nvidia, and they're a Chinese company.
All the PCBs come from China, or many of them. They're like, "Do you have enough PCB capacity?
Great. Hey memory vendors, who has all the memory capacity? Okay, Nvidia does. Great." When you
look at who is AGI-pilled enough to buy compute on long timelines at levels that seem ridiculous
to people who aren't AGI-pilled—but nonetheless,
they're willing to pay a pretty good margin and
sign it now because they view in the future that ratio is screwed up—the same thing happens
with the supply chain for semiconductors. I don't think Nvidia is quite AGI-pilled.
Jensen doesn't believe software is going to be fully automated and all these things.
Accelerated computing, not AI chips, right? It's AI chips.
But that's what he calls it, right? Yeah. I think it's a broader term, AI is within
that, but also physics modeling and simulations. But it's like he's not
embracing the main use case. I think he's embracing it, but I just don't
think he's AGI-pilled like Dario or Sam. But he's still way, way more AGI-pilled than
Google was in Q3 of last year, or Amazon was in Q3 of last year, and he saw way more demand.
The reason is pretty simple. You can see all the data center construction.
He's like, "Okay, I want to have this market share."
We have all the data centers tracked, and there's a lot of data centers
that could be one or the other.
To some extent, Google and Amazon, Google
especially, even though their TPU is just better for them to deploy, they have to
deploy a crap load of GPUs because they don't have enough TPUs to fill up their
data centers. They can't get them fabbed. I have a question about that.
Google sold a million, was it the v7s?
Yes. —the Ironwoods to Anthropic, and you're saying the
big bottleneck right now, this year or next year, I guess going forward forever now,
is going to be the logic and memory, the stuff it takes to build these chips.
Google has DeepMind, the third prominent AI lab. If this is the big bottleneck, why would they
sell it rather than just giving it to DeepMind? This is again a problem of… DeepMind people
were like, "This is insane. Why did we do this?" But Google Cloud people and Google
executives saw a different thought process. You and I know the compute team at Anthropic.
Both of the main people came from Google.
They saw this dislocation, they negotiated
a deal, and they were able to get access to this compute before Google realized.
The chain of events, at least from our data that we found, was in early Q3, over
the course of six weeks, we saw capacity on TPUs go up by a significant amount.
It went up multiple times in those six weeks. There were multiple requests. Google
even had to go to TSMC and explain to them why they needed this increase in
capacity because it was so sudden. A lot of that capacity increase
was for selling to Anthropic. Because Anthropic saw it before Google.
And then Google had Nano Banana and Gemini 3 which caused their user metrics to skyrocket.
Then leadership at Google was like, "Oh." Then they started making the statement that
we have to double compute every six months, or whatever the exact number was.
They really woke up a lot more, and then they went to TSMC and said, "We want more. We want
more." TSMC replied, "Sorry guys, we're sold out.
We can maybe get 5-10% more for 2026,
but really we're going to work on 2027." There was this information asymmetry among
the labs, in my mind. I don't know exactly. It's the narrative I've spun myself from
seeing all the data in the supply chain on wafer orders and what's going on with the data
centers that Anthropic and Fluidstack signed. It's pretty clear to me that Google screwed up.
You can see this from Google's Gemini ARR. They had next to nothing in Q1 to Q3—in Q3
a little bit once they started inflecting. But in Q4 they reached $5 billion
in revenue on an ARR basis. It's clear Google didn't see
revenue skyrocket initially. In a sense, Anthropic had a little bit of
commitment issues before their ARR exploded, even though they had far more information
asymmetry and saw what was coming down the pipe. Google is going to be more conservative
than Anthropic and Google had even less ARR.
So they were just not willing to do it,
and then they realized they should do it. Since then, Google has gotten absurdly
AGI-pilled in terms of what they're doing. They bought an energy company. They're
putting deposits down for turbines. They're buying a ridiculous
percentage of powered land. They're going to utilities and
negotiating long-term agreements. They're doing this on the data center
and power side very aggressively. I think Google woke up towards the end
of last year, but it took them some time. How many gigawatts do you think Google
will have by the end of next year? Buy my data.
You charge for that kind of information. Yes, yes. I feel like every year the
bottleneck for what is preventing us from scaling AI compute keeps changing.
A couple years ago it was CoWoS. Last year it was power. You'll tell me
what the bottleneck is this year. But I want to understand five years
out, what will be the thing that is constraining us from deploying the singularity?
The biggest bottleneck is compute. For that,
the longest lead time supply chains
are not power or data centers. They're actually the semiconductor
supply chains themselves. It switches back from power and data
centers as a major bottleneck to chips. In the chip supply chain, there's
a number of different bottlenecks. There's memory, logic wafers from
TSMC, and the fabs themselves. Construction of the fabs takes two to three years,
versus a data center which takes less than a year. We've seen Amazon build data
centers in as fast as eight months. There's a big difference in lead
times because of the complexity of building the fab that actually makes the chips. The tools also have really long lead times.
The bottlenecks, as we've scaled, have shifted based on what the supply
chain is currently not able to do. It was CoWoS, power, and data centers, but
those were all shorter lead time items. CoWoS is a much simpler process
of packaging chips together. Power and data centers are ultimately way simpler
than the actual manufacturing of the chips.
There's been some sliding of capacity
across mobile or PC to data center chips, which has been somewhat fungible.
Whereas CoWoS, power, and data centers have had to start anew as supply chains.
But now there's no more capacity for the mobile and PC industries—which used to be the majority
of the semiconductor industry—to shift over to AI. Nvidia is now the largest customer at TSMC
and SK Hynix, the largest memory manufacturer. It's sort of impossible for the
sliding of resources away from the common person's PCs and smartphones
to shift any more towards the AI chips. So now the question is how do
we scale AI chip production? That's the biggest bottleneck as we go to 2030.
It would be very interesting if there's an absolute gigawatt ceiling that you can project
out to 2030 based just on "We can't produce more
than this many EUV machines."
To scale compute further, there are different bottlenecks this year and
next year, but ultimately by 2028 or 2029, the bottleneck falls to the lowest rung
on the supply chain, which is ASML. ASML makes the world's most
complicated machine: an EUV tool. The selling price for those is $300-400 million.
Currently, they can make about 70. Next year, they'll get to 80.
Even under very aggressive supply chain expansion, they only get to a little
bit over 100 by the end of the decade. What does that mean? They can make a hundred of these
tools by the end of the decade, and 70 right now. How does that actually translate to AI compute?
We see all these numbers from Sam Altman and many others across the supply chain:
gigawatts, gigawatts, gigawatts. How many gigawatts are we adding?
We see Elon saying a hundred gigawatts in space. A year.
A year. The problem with any of these numbers, or the challenge to these numbers,
is actually not the power or the data center.
We can dive into that, but
it's manufacturing the chips. Take a gigawatt of Nvidia's Rubin chips.
Rubin is announced at GTC, I believe the week this podcast goes live.
To make a gigawatt worth of data center capacity of Nvidia's latest chip that they're
releasing towards the end of this year, you need a few different wafer technologies.
You need about 55,000 wafers of 3 nm. You need about 6,000 wafers of 5 nm, and then
you need about 170,000 wafers of DRAM memory. Across these three different buckets,
each requires different amounts of EUV. When you manufacture a wafer, there are thousands
and thousands of process steps where you're depositing material and removing them.
But the key critical step—which at least in advanced logic is 30% of the
cost of the chip—is something that doesn't actually put anything on the wafer.
You take the wafer, you deposit photoresist, which is a chemical that chemically
changes when you expose it to light.
Then you stick it into the EUV tool, which
shines light at it in a certain way. It patterns it. There's what's called a mask,
which is effectively a stencil for the design. When you look at a leading-edge 3 nm wafer, it has
70 or so masks, 70 or so layers of lithography, but 20 of them are the most advanced EUV.
If you need 55,000 wafers for a gigawatt, and you do 20 EUV passes per wafer, you can do the math.
That's 1.1 million passes of EUV for a single gigawatt. It's pretty simple. Once you add the
rest of the stuff, it ends up being 2 million, across 5 nm and all the memory.
You're at roughly 2 million EUV passes for a single gigawatt. These tools
are very complicated. When you think about what it's doing across a wafer, it's taking
the wafer and scanning and stepping across. It does this dozens of times
across the whole wafer.
When you're talking about how
many EUV passes, that’s the entire wafer being exposed at a certain rate.
An EUV tool can do roughly 75 wafers per hour, and the tool is up roughly 90% of the time.
In the end, you need about three and a half EUV tools to do the 2 million EUV
wafer passes for the gigawatt. So three and a half EUV
tools satisfies a gigawatt. It's funny to think about the numbers. What does
a gigawatt cost? It costs roughly $50 billion. Whereas what do three and a half EUV tools cost?
That's $1.2 billion. It's actually quite a lower number, which is interesting to think about.
Fifty gigawatts of economic CapEx in the data center, and what gets built on top of
that in terms of tokens is even larger. It might be $100 billion worth of
AI value into the supply chain,
three years, TSMC has done $100 billion of CapEx.
So it's $30/$30/$40 billion. A small fraction of that is being used by Nvidia for the 3 nm, or
previously 4 nm, that it's using for its chips. What were its earnings last
quarter? It was $40 billion. So $40 billion times four is $160 billion.
Nvidia alone is turning some small fraction of $100 billion in CapEx, which is going to
be depreciated over many years and not just this one year, into $160 billion in a single year.
That gets even more intense when you go down the supply chain to ASML, which is taking a billion
dollars' worth of machines to produce a gigawatt. Of course, those machines last for more
than a year so it’s doing more than that. Now I want to understand, how many
such machines will there be by 2030, if you include not just the ones that are sold
that year, but have been compiling over the previous years? What does that imply? Sam Altman
says he wants to do a gigawatt a week in 2030.
When you add up those numbers,
is it compatible with that? That's completely compatible,
if you think about it. TSMC and the entire ecosystem have
something like 250 to 300 EUV tools already. Then you stack on 70 this year, 80
next year, growing to 100 by 2030. You're at 700 EUV tools by the end of the decade.
700 EUV tools, at three and a half tools per gigawatt—assuming it's all allocated to AI, which
it's not—gets you to 200 gigawatts worth of AI chips for the data centers to deploy.
Sam wants 52 gigawatts a year. He's only taking 25% share then.
Obviously, there's some share given to mobile and PC, assuming we're even allowed to have consumer
goods still and we don't get priced out of them. But roughly, he's saying 25% market
share of the total chips fabbed. That's very reasonable given that this
year alone, I think he's going to have access to 25% of the Blackwell GPUs
that are deployed. It's not that crazy.
When did ASML start shipping
EUV tools, when 7 nm started? I don't know when that was exactly.
You're saying in 2030, they're going to be using machines that initially were shipped in 2020.
So for ten years, you're using the same most important machine in this
most technologically advanced industry in the world? I find that surprising.
ASML's been shipping EUV tools now for roughly a decade, but it only entered mass volume production
around 2020. The tool's not the same. Back then, the tools were even lower throughput.
There are various specifications around them called overlay.
I was mentioning you're stacking layers on top of each other.
You'll do some EUV, you'll do a bunch of different process steps—depositing stuff,
etching stuff, cleaning the wafer—dozens of those steps before you do another EUV layer.
There's a spec called overlay, which is: you did all this work, you drew these lines
on the wafer, now I want to draw these dots. Let's say I want to draw these dots to
connect these lines of metal to holes, and then the next layer up is another set of lines
going perpendicular, so now you're connecting
wires going perpendicular to each other.
You have to be able to land them on top of each other. It's called overlay. Overlay is
a spec that's been improved rapidly by ASML. Wafer throughput has been
improved rapidly by ASML. The price of the tool has gone up, but not
as much as the capabilities of the tool. Initially, the EUV tools were $150 million.
Over time, they're now $400 million as I look out to 2028.
But the capabilities of the tools have more than doubled as well, especially
on throughput and overlay accuracy, which is the ability to accurately align the subsequent
passes on top of each other even though you do tons of steps between. ASML is improving super
rapidly. It's also noteworthy to say that ASML is maybe one of the most generous companies
in the world. They have this linchpin thing. No one has anything competitive. Maybe China will
have some EUV by the end of the decade, but no one else has anything even close to EUV, and yet they
haven't taken price and margins up like crazy.
You go ask some other folks that we
talk to all the time, like Leopold, and they're like, "Let's have the price go
up." Because they can. The margin is there. You can take the margin. Nvidia takes the
margin. Memory players are taking the margin. But ASML has never raised the price more than
they've increased the capability of the tool. In a sense, they've always provided
net benefit to their customers. It's not that the tool is stagnant,
it's just that these tools are old. Yes, you can upgrade them some,
and the new tools are coming. For simplicity's sake, we're
ignoring the advances in overlay or throughput per tool for this podcast.
You say we're producing 60 of these machines this year and then 70, 80 over subsequent years. What would happen if ASML just decided
to double its CapEx or triple its CapEx? What is preventing them from
producing more than 100 in 2030? Why are you so confident that
even five years out, you can be relatively sure what their production will be?
I think there are a couple factors here. ASML has not decided to just go YOLO,
let's expand capacity as fast as possible.
In general, the semiconductor
supply chain has not. It's lived through the booms and busts,
and we can talk a bit more about it. Basically some players have recently woken
up, but in general no one really sees demand for 200 gigawatts a year of AI chips, or
trillions of dollars of spend a year in the semiconductor supply chain. They're
not AI-pilled. They're not AGI-pilled. We're going to get to a
trillion dollars this year. Yeah, I feel you, but I'm saying no one
really understands this in the supply chain. Constantly, we're told our numbers are
way too high, and then when they're right, they're like, "Oh, yeah, but your next
year's numbers are still too high." ASML's tool has four major components.
It has the source, which is made by Cymer in San Diego.
It has the reticle stage, which is made in Wilmington, Connecticut. It has the wafer
stage. It has the optics, the lenses and such.
Those last two are made in Europe.
When you look at each of these four, they're tremendously complex supply chains that,
(A) they have not tried to expand massively, and (B) when they try to expand
them, the time lag is quite long. Again, this is the most complicated machine
that humans make, period, at any sort of volume. Let's talk about the source specifically. What
does the source do? It drops these tin droplets. It hits it three subsequent
times with a laser perfectly. The first one hits this tin
droplet, it expands out. It hits it again, so it expands
out to this perfect shape, and then it blasts it at super high power.
The tin droplets get excited enough that they release EUV light, 13.5 nanometer, and then
it's in this thing that is collecting all the light and directing it into the lens stack.
Then you have the lens stack, which is Carl Zeiss, as you mentioned, and some other folks, but
Zeiss being the most important part of it. They also have not tried to expand
production capacity because they don't see...
They're like, "We're growing a lot because of AI.
We're growing from 60 to 100." It's like, "No, no, no. We need to go to a couple hundred, but it's
fine. Whatever." Each of these tools has, I think, 18 of these lenses, effectively.
They are multilayer mirrors, which are perfect layers of molybdenum
and ruthenium, if I recall correctly, stacked on top of each other in many layers,
and then the light bounces off of it perfectly. When we think about a lens, it's in
a shape, and it focuses the light. This is like a mirror that's also
a lens, so it's pretty complicated. Any defect in these super thinly
deposited stacks will mess it up. Any curvature issues will mess it up.
There are a lot of challenges with scaling the production.
It's quite artisanal in this sense because you're not making tens of thousands
of these a year, you're making hundreds, you're making thousands. 60 tools a year, 18 of
these per tool, you’re still in the hundreds,
of tools, or you're at the thousand number
roughly for these lenses and projection optics. Then you step forward to the reticle stage,
which is also something really crazy. This thing moves at, I want to say, nine Gs.
It will shift nine Gs because as you step across a wafer, the tool will go... The
wafer stage is complementary. It's the wafer part. You line these two things up.
You're taking all the light through the lenses that's focused, and here's
the reticle, here's the wafer. The reticle's moving one direction, the
wafer's moving the other direction as it scans a 26x33 millimeter section
of the wafer, and then it stops. It shifts over to another part
of the wafer and does it again. It does that in just seconds.
Each of them is moving at nine Gs in opposite directions.
Each of these things is a wonder and marvel of chemistry, fabrication, mechanical engineering,
and optical engineering, because you have to align
all these things and make sure they're perfect.
All of these things have crazy amounts of metrology because you have
to perfectly test everything. If anything is messed up, the yield goes to
zero, because this is such a finely tuned system. By the way, it's so large that you're building
it in the factory in Eindhoven, Netherlands, and they're deconstructing it and shipping it on
many planes to the customer site, and then you're reassembling it there and testing it again.
That process takes many, many months. There are so many steps in the supply
chain, whether it's Zeiss making their lenses and projection optics or Cymer, which is
an ASML-owned company, making the EUV source. Each of these has its own complex supply chain.
ASML has commented that their supply chain has over ten thousand people in it.
Like individual suppliers? Yes. It might not be directly. It might
be through Zeiss having so many suppliers and XYZ company having so many suppliers.
If you just think about it, you're talking about two physically moving objects that are
the size of a wafer, and it has to be accurate
to the level of single-digit nanometers or even
smaller because the entire system, the overlay, the layer-to-layer overlay variation,
has to be on the order of 3 nanometers. If the overlay is 3 nms, that means each
individual part, the accuracy of its physical movement has to be even less than that.
It has to be sub-one nanometer in most cases, because the error of these things stack up.
There's no way to just snap your fingers and increase production. Things as simple as power.
The US going from zero percent power growth to two percent power growth, even though China's
already at thirty, was so hard for America to do. And that's a really simple supply chain with
very few people in it who make difficult things. There are probably 100,000
electricians and people who work in the electricity supply chain, or more, in the US?
When you look at ASML, they employ so few people.
Carl Zeiss probably employs less than a
thousand people working on this, and all of those people are super, super specialized.
You can't just train random people up for this in the snap of a finger.
You can't just get your entire supply chain to get galvanized.
Nvidia's had to do a lot to get the entire supply chain to even deliver
the capacity they're going to make this year. When you go talk to Anthropic, they're
like, "We're short of TPUs, we're short of training, and we're short of GPUs."
When you go talk to OpenAI, they're like, "We're short of these things."
OpenAI and Anthropic know they need X. Nvidia is not quite as AGI-pilled.
They're building X - 1. You go down the supply chain, everyone's doing X - 1.
In some cases, they're doing X ÷ 2, because they're not AGI-pilled.
You end up with this time lag for the whip to react.
The AI-pilledness and the desire to increase production takes so long.
Once they finally understand that they need to increase production rapidly…
They think they understand.
They think AI means we have to go from 60
to 100, in addition to the tools getting better and faster, the source getting
higher power from 500 watts to 1,000, and all these other aspects of the supply chain
advancing technically and increasing production. They think they're actually
increasing production a lot. But if you flow through the
numbers… What does Elon want? He wants 100 gigawatts a year
in space by 2028 or 2029. Sam Altman wants 52 gigawatts a
year by the end of the decade. Anthropic probably needs the
same, and Google needs that. You go across the supply chain, and it's
like, wait, no, the supply chain can't possibly build enough capacity for everyone
to get what they want on the side of compute.
I feel like in the data center
supply chain for the last few years, people have been making arguments like, "We are
bottlenecked by this specific thing, therefore AI compute can't scale more than X."
But as you've written about, if the grid is a bottleneck, then we just do behind the
meter on the site, we do gas turbines, et cetera. If that doesn't work, there are all these
other alternatives that people fall back on. I want to ask whether we can imagine a similar
thing happening in the semiconductor supply chain. If EUV becomes a bottleneck, what if we
just went back to 7 nm and did what China is doing currently, producing 7 nm chips
with multi-patterning with DUV machines? If you look at a 7 nm chip like the
A100, there's been a lot of progress obviously from the A100 to the B100 or B200.
How much of that progress is just numerics?
If you just hold FP16 constant from A100 to B100.
The B100 is a little over one petaflop, and the A100 is like 300 teraflops.
Yeah, 312. Holding numerics constant, you have
a 3x improvement from A100 to B100. Some of that is the process improvement, some of
that is just the accelerator design improving, which we could replicate again in the future.
It seems there's actually a very small effect from the process improving from 7nm to 4 nm.
I don't know the numbers offhand but let's say there's 150k wafers per month of 3 nm
and eventually similar amounts for 2 nm. But then there's a similar amount for 7 nm.
If you have all those old wafers and there's maybe a 50% haircut because the bits per
wafer area are 50% less or something, it doesn't seem that bad to just bring on 7
nm wafers if that gives you another fifty or
hundred gigawatts. Tell me why that's naive.
We potentially do go crazy enough that this happens because we just need incremental
compute, and the compute is worth the higher cost and power of these chips.
But it's also unlikely to a large extent because some of these are not fair comparisons.
For example, from A100, which is 312 teraflops, to Blackwell, which is 1,000 or 2,000 FP16,
and then Rubin is 5,000 or so FP16… It's not a fair comparison because these chips
have vastly different design targets. With A100, Nvidia optimized
for FP16 and BF16 numerics. When you look at Hopper, they didn't care
as much about that; they cared about FP8. When you look at Rubin, they don’t
care about FP16 and BF16 so much,
they care mostly about FP4 and FP6.
Numerics are what they've designed their chip for. Let's say we make a new chip design on 7 nm,
optimized for the numerics of the modern day. The performance difference is
still going to be much larger than the FLOPS difference you mentioned.
Often it's easy to boil things down to FLOPS per watt or FLOPS per dollar,
but that's not a fair comparison. Let's look at Kimi K2.5 and DeepSeek.
When you look at those two models and their performance on Hopper versus
Blackwell on very optimized software, you get vastly different performance.
Most of this is not attributed to FLOPS or numerics, because those
models are actually eight-bit.
So it's not like Blackwells and Hopper are both
optimized for eight-bit, and Blackwell is not really taking advantage of its four-bit there.
The performance gulf is actually much larger. Sure it's one thing to shrink process
technology and make the transistor smaller so each chip has X number of FLOPS,
but you forget the big gating factor. These models don't run on a single chip.
They run on hundreds of chips at a time. If you look at DeepSeek's production
deployment, which is well over a year old now, they were running on 160 GPUs.
That's what they serve production traffic on. They split the model across 160 GPUs.
Every time you cross the barrier from one chip to another, there is an efficiency loss.
You have to transmit over high-speed electrical SerDes, which brings
a latency cost and a power cost. There are all these dynamics that hurt.
As you shrink and shrink the process node, you've increased the amount of compute in a single chip.
Now in-chip movement of data is at least tens
of terabytes a second, if not
hundreds of terabytes a second. Whereas between chips, you're on
the order of a terabyte a second. Then you have this movement of data between chips
that are super close to each other physically. You can only put so many chips
close to each other physically, so you have to put chips in different racks.
The movement of data between racks is on the order of hundreds of gigabits a second, 400 gig or 800
gig a second, so roughly 100 gigabytes a second. So you have this huge ladder: on-chip
communication is super fast, within the rack is an order of magnitude slower, and outside
the rack is an order of magnitude lower than that. As you break the bounds of chips,
you end up with a performance loss. The reason I explain this is because
when you look at Hopper versus Blackwell, even if both are using a rack's worth of
chips, Hopper is significantly slower. The amount of performance you have leveraged to
the task within each domain—tens of terabytes a second of communication between these processing
elements versus terabytes a second between these
processing elements—is much, much higher and
therefore the performance is much higher. When you look at inference at 100 tokens
a second for DeepSeek and Kimi K2.5, the performance difference between Hopper
and Blackwell is on the order of 20x. It's not 2x or 3x like the FLOPS
performance difference indicates, even though those are on the same process node.
There are just differences in networking technologies and what they've worked on.
You can translate some of these back, but when you look at what they're doing
on 3 nm with Rubin, some of those things are simply not possible to do all the way back
on A100, even if you make a new chip for 7 nm. There are certain architectural improvements
you can port and certain ones you cannot. The performance difference is not just
going to be the difference in FLOPS. It's in some senses cumulative between
the difference in FLOPS per chip, networking speed between chips, how many FLOPS
are on a chip versus a system, and memory bandwidth on a single chip versus an entire
system. All of these things compound. Can I ask you a very naive question?
The B200 now has two dies on a single chip,
so you can get that bandwidth without
having to go through NVLink or InfiniBand. Next year, Rubin Ultra will
have four dies on one chip. What is preventing us from just doing
that with an older… How many dies could you have on a single chip and still
get these tens of terabytes a second? Even within Blackwell, there are differences in performance when you're communicating
on the chip versus across the chips. Those bounds are obviously much smaller than
when you're going out of the entire chip. When you scale the number of chips
up, there is some performance loss. It's not perfect, but it is way
better than different entire packages. How large can advanced packaging scale?
The way Nvidia is doing it is CoWoS. Google, Broadcom, MediaTek, and
Amazon's Trainium are all doing CoWoS. But actually you can go look back at what Tesla
did with Dojo, which they cancelled and restarted.
Dojo was a chip that was
the size of an entire wafer. They had 25 chips on it. There were some
tradeoffs. They couldn't put HBM on it. But the positive side was
that they had 25 chips on it. To date, it is still probably the best chip
for running convolutional neural networks. It's just not great at transformers because the
shape of the chip, the memory, the arithmetic, and all these various specifications are just
not well-suited for transformers. They're well-suited for CNNs. Dojo chips were optimized
around that, and they made a bigger package. But as you make packages bigger and bigger,
you have other constraints: networking speed, memory bandwidth, and cooling capabilities.
All of these things start to rear their heads. It's not simple. But yes, you will see a trend
line of more chips on the package, and yes, you're going to be able to do that on 7 nm.
In fact, that's what Huawei did with their Ascend 910C or D.
They initially put one, and then they did two.
They're focusing on scaling the
packaging up because that is an area where they can advance faster than
process technology where they can't shrink. But at the end of the day, that’s something
you can do on the leading-edge chips too. Anything you do on 7 nm, you can also
probably do on 3 nm in terms of packaging. If we end up in this world in 2030 where the
West has the most advanced process technology but has not ramped it up as much, whereas
China… I don't know if you think by 2030 they would have EUV and 2 nm or whatever.
But they are semiconductor-pilled and they are producing in mass quantity.
Basically, I'm wondering what the year is where there's a crossover, where our
advantage in process technology has faded enough, and their advantage in scale has increased enough.
And also, if their advantage in having one country with the entire supply chain indigenized—rather
than having random suppliers in Germany and the Netherlands—would mean that China would
be ahead in its ability to produce mass flops.
To date, China still does not have an entirely
indigenized semiconductor supply chain. But would they in 2030?
By 2030, it's possible that they do. But to date, all of China's 7 nm and
14 nm capacity uses ASML DUV tools. The amount that they can
import from ASML is large. But the vast majority of ASML's revenue,
especially on EUV all of it, is outside of China. The scale advantage is still in
the favor of the West plus Taiwan, Japan, and Korea, et cetera.
But they're trying to make their own DUV and EUV tools, right?
They're trying to do all these things. The question is how fast can they advance
and scale up production as well as quality. To date, we haven't seen that.
Now I'm quite bullish that they're going to be able to do these things
over the next five to ten years. They will really scale up production
and kick it into high gear. They have more engineers working on it and
more desire to throw capital at the problem.
So by 2030, will they have fully indigenized DUV?
I think for sure. DUV, yes. And fully indigenized EUV by 2030?
I think they'll have working tools. I don't think that they'll be
able to manufacture a bunch yet. There's having it work, and
then there's production hell. ASML had EUV working in the
early 2010s at some capacity. The tools were not accurate enough.
They were not scaled for high-volume manufacturing or reliable enough.
They had to ramp production, and that all took time. Production hell takes
time. That's why it took another five to seven years to get EUV into mass production at
a fab rather than just working in the lab. How many DUV tools do you think
they'll be able to manufacture in 2030? ASML?
No, China. That's a great question. It's a bit of a
challenge to look into this supply chain especially. We try really hard. In some instances,
they're buying stuff from Japanese vendors.
If they want a fully indigenized supply chain,
they need to not buy these lenses, projection optics, or stages from Japanese vendors.
They need to build it internally. It's really tough to say where
they'll be able to get to. I honestly think it's a shot in the dark.
But it's probably not unlikely that they'll be able to do on the order of 100 DUV tools
a year, whereas ASML is currently doing hundreds of DUV tools a year.
No company has a process node where they make a million wafers a month.
Elon says he wants to do it and China is obviously going to do it.
TSMC is trying to do that. The memory makers may get to a million wafers
a month as well, but not in a single fab. It's mind-boggling to think of
that scale, and challenging to see the supply chain galvanized for that.
I don't want to doubt China's capability to scale. I guess this is an interesting question.
I think at some point SemiAnalysis
will do the deep dive on this.
By when would indigenized Chinese production be bigger than the rest of the West combined.
And put in the input of your model of when they'll have DUV machines and EUV machines at scale?
Because there's this question around if you have long timelines on AI—by long meaning
2035, which is not that long in the grand scheme of things—should you expect a world
where China is dominating in semiconductors? It doesn't get asked enough
because if you're in San Francisco, we're thinking on timescales of weeks.
If you're outside of San Francisco, you're not thinking about AGI at all. What if we
have AGI? What if you have this transformational thing that is commanding tens or hundreds
of trillions of dollars of economic growth and token output, but it happens in 2035?
What does that imply for the West versus China? SemiAnalysis has got to write
the definitive model on this.
It's really challenging when you
move timescales out that far. What we tend to focus on is tracking every
data center, every fab, and all the tools. We track where they're going, but the time
lags for these things are relatively short. We can only make reasonably accurate estimates
for data center capacity based on land purchasing, permits, and turbine purchasing.
We know where all these things are going, that's the data we sell.
As you go out to 2035, things are just so radically different.
Your error bars get so large it's hard to make an estimate.
But at the end of the day, if takeoff or timelines are slow enough, I don't see why
China wouldn't be able to catch up drastically. In some sense, we've got this valley where, three
to six months ago, or maybe even now, Chinese models are as competitive as they've ever been.
I think Opus 4.6 and GPT 5.4 have really pulled
away and made the gap a little bit bigger, but
I'm sure some new Chinese models will come out. As we move from selling tokens where they
provide the entire reasoning chain, to selling automated white-collar work—an automated
software engineer, you send them the request, they give you the result back, and there's a bunch
of thinking on the back end that they don't show you—the ability to distill out of American
models into Chinese models will be harder. Second, look at the scale of
the compute the labs have. OpenAI exited the year with
roughly two gigawatts last year. Anthropic will get to
two-plus gigawatts this year. By the end of next year, they'll
both be at ten gigawatts of capacity. China is not scaling their AI
lab compute nearly as fast. At some point, when you can't distill the
learnings from these labs into the Chinese models, plus with this compute race that OpenAI,
Anthropic, Google, and Meta are all racing on, they end up getting to a point where the model
performance should start to diverge more.
Then look at all this CapEx
being spent on data centers. Amazon is spending $200
billion, Google $180 billion. All these companies are spending
hundreds of billions of dollars on CapEx. There's nearly a trillion dollars
of CapEx being invested in data centers in America this year, roughly.
What's the return on invested capital here? You and I would think the return on invested
capital for data center CapEx is very high. If we look at Anthropic's revenues,
in January they added $4 billion. In February, which was a shorter
month, they added $6 billion. We'll see what they can do in March and April, given that compute constraints are
what's bottlenecking their growth. The reliability of Claude is quite low
because they're so compute constrained. But if this continues, then the ROIC
on these data centers is super high. At some point, the US economy starts growing
faster and faster over this year and next year because of all this CapEx, all the revenue these
models are generating, and the downstream supply
chain. China doesn't have that yet. They
have not built the scale of infrastructure to invest in models, get to the capabilities,
and then deploy these models at such scale. When you look at Anthropic,
they're at $20 billion ARR. The margins are sub-50 percent, at least
as last reported by The Information. So that's $13 or $14 billion of compute that it's
running on rental cost-wise, which is actually $50 billion worth of CapEx that someone laid out
for Anthropic to generate their current revenue. China has just not done this.
If and when Anthropic 10Xs revenue again—and I think our answer would be when, not if—China
doesn't have the compute to deploy at that scale. So there is some sense that
we're in a fast takeoff. It's not like we're talking
about a Dyson sphere by X date, it's more like the revenue is compounding at
such a rate that it does affect economic growth. The resources these labs are
gathering are growing so fast.
China hasn't done that yet, so in that case,
the US and the West are actually diverging. The flip side is that these infrastructure
investments have middling returns. Maybe they're not as good as hoped.
Maybe Google is wrong for wanting to take free cash flow to zero and
spend $300 billion on CapEx next year. Maybe they’re just wrong and people on
Wall Street who are bearish and people who don't understand AI are correct.
In that case, the US is building all this capacity but doesn't get great returns.
Meanwhile, China is able to build a fully vertical, indigenized supply chain, instead of
the US/Japan/Korea/Taiwan/SE Asia/Europe countries together building this less vertical supply chain.
In a sense, at some point China is able to scale past us if AI takes longer to get to certain
capability levels than the vast majority of your guests on this podcast believe.
It's fast timelines, the US wins; long timelines, China wins.
Yeah but I don't know what fast timelines means.
I don't think you have to believe in AGI
to have the timelines where the US wins. Let's go back to memory. I think people on
Wall Street and people in the industry are understanding how big this is, but maybe generally
people don't understand what a big deal it is. So we've got this memory crunch,
as you were talking about. And earlier I was asking about,
oh, could we solve for the EUV tool shortage by going back to seven nanometers?
So let me ask a similar question about memory. HBM is made of DRAM, but has three
to four times fewer bits per wafer area than the DRAM it's made out of.
Is it possible that accelerators in the future could just use commodity
DRAM and not HBM, so we can get much more capacity out of the DRAM we have?
The reason I think this might be possible is, if we're going to have agents that are
just going off and doing work, and it's not a synchronous chatbot application, then you
don't necessarily need extremely fast latency.
Maybe you can have lower bandwidth,
because the reason you stack DRAM into HBM is for higher bandwidth.
Is it possible to go to HBM accelerators and basically have the opposite
of Claude Code Fast, like have Claude Slow? At the end of the day, the incremental
purchaser who's willing to pay the highest price for tokens also ends up being
the one that's less price-sensitive. Compute should be allocated, in a capitalistic
society, towards the goods that have the highest value, and the private market
determines this by willingness to pay. To some extent, Anthropic could
actually release a slow mode. They could release Claude Slow Mode and increase
tokens per dollar by a significant amount. They could probably reduce the price of Opus 4.6
by 4-5x and reduce the speed by maybe just 2x. The curve on inference throughput versus
speed is already there just on HBM.
And yet they don't, because no one
actually wants to use a slow model. Furthermore, on these agentic tasks, it's great
that the model can run at a time horizon of hours. But if the model was running slower,
those hours would become a day. Vice versa, if the model is running
faster, those hours become an hour. No one really wants to move to a day-long wait
period, because the highest-value tasks also have some time sensitivity to them.
I struggle to see… Yes, you could use regular DRAM.
There are a couple of challenges with this. One of the core constraints of chips is
that a chip is a certain size, and all of the I/O escapes on the edges.
Often, the left and right of the chip are HBM—so the I/O from the chip
to the HBM is on the sides—and then the
top and bottom are I/O to other chips.
If you were to change from HBM to DDR, all of a sudden this I/O on the edge
would have significantly less bandwidth, but significantly more capacity per chip.
But the metric you actually care about is bandwidth per wafer, not bits per wafer.
Because the thing that is constraining the FLOPS is just getting in and out the next matrix,
and for that you just need more bandwidth. Yeah, getting out the weights and
getting in and out the KV cache. In many cases, these GPUs are not
running at full memory capacity. It's obviously a system design thing:
model, hardware, and software co-design. You have to figure out how much KV cache
you need, how much you keep on the chip, how much you offload to other chips and
call when you need it for tool calling, and how many chips you parallelize this on.
Obviously, the search space for this is very
broad, which is why we have InferenceX,
an open-source model that searches all the optimal points on inference for a
variety of different chips and models. The point is, you're not always
necessarily constrained by memory capacity. You can be constrained by FLOPS, network
bandwidth, memory bandwidth, or memory capacity. If you really simplify it down,
there are four constraints, and each of these can break out into more.
If you switch to DDR, yes, you produce four times the bits per DRAM wafer, but all of
a sudden the constraints shift a lot and your system design shifts. You go slower.
Is the market smaller? Maybe. But also, all these FLOPS are wasted because they're
just sitting there waiting for memory. You don't need all that capacity because you can't
really increase batch size because then the KV cache would take even longer to read.
Makes sense. What is the bandwidth difference between HBM and normal DRAM?
An HBM4 stack—let's talk about the stuff
that's in Rubin, because that's what we've been
indexing on—is 2048 bits across, connected in an area that's 13 millimeters wide.
It transfers memory at around 10 giga-transfers a second.
So a stack of HBM4 is 2048 bits on an area that's roughly 11 to 13 millimeters wide.
That's the shoreline you're taking on the chip. In that shoreline, you have 2048 bits
transferring at 10 giga-transfers per second. You multiply those together and divide by eight, bits to a byte, and you're at roughly
2.5 terabytes a second per HBM stack. When you look at DDR, in that same
area, it's maybe 64 or 128 bits wide. That DDR5 is transferring at anywhere from
6.4 to maybe 8,000 giga-transfers a second. So your bandwidth is significantly lower.
It's 64 times 8,000 divided by eight, which puts you at 64 gigabytes a second.
Even if you take a generous interpretation of
128 times 8 giga-transfers, you're at 128
gigabytes a second for the same shoreline, versus 2.5 terabytes a second.
There's an order of magnitude difference in bandwidth per edge area.
If your chip is a square, or 26 by 33 millimeters—which is the maximum size for an
individual die—you only have so much edge area. On the inside of that chip,
you put all your compute. There are things you can do to try and
change that, like more SRAM or more caching. But at the end of the day, you're
very constrained by bandwidth. Then there's the question of where you can
destroy demand to free up enough for AI. I guess the picture is especially bad because,
as you're saying, if it takes four times more wafer area to get the same byte, for HBM you have
to destroy four times as much consumer demand for laptops and phones to free up one byte for AI.
What does this imply for the next year or two? Sorry for the run-on question, in your newsletter
you said 30% of Big Tech's CapEx in 2026 is going towards memory?
Yes.
That's insane, right? Of the $600 billion
or whatever, 30% is going just to memory. Yes. Obviously, there's some level
of margin stacking that Nvidia does, so you have to separate that out and apply
their margin to the memory and the logic. But at the end of the day, a third
of their CapEx is going to memory. That's crazy. What should we expect over the
next year or two as this memory crunch hits? The memory crunch will continue to get
harder, and prices will continue to go up. This affects different parts
of the market differently. Are people going to hate AI more and more?
Yes, because smartphones and PCs are not going to get incrementally better year on year.
In fact, they're going to get incrementally worse. If you look at the bill of materials for an
iPhone, what fraction of it is the memory? How much more expensive does an iPhone get
if the memory is two times more expensive? I believe an iPhone has 12 gigabytes of memory.
Each gig used to cost roughly $3-4, so that's $50.
But now the price of memory has tripled.
Let's say it's $12 per gig for DDR. Now you're talking about $150 versus $50.
That's a $100 increase in cost for Apple. Apple has some margin, they're
not just going to eat the margin. NAND also has the same market dynamics,
so in reality, it's probably a $150 increase on the iPhone.
So now that’s a $100 cost increase and that’s just on the DRAM.
The NAND also has the same sort of market. So in fact it’s probably a
$150 increase on the iPhone. Apple either has to pass that
on to the consumer or eat it. I don't see Apple reducing their margin
too much, maybe they eat a little bit. But at the end of the day, that means the end
consumer is paying $250 more for an iPhone. Now that’s just on last
year’s pricing versus today’s. There is some lag before Apple feels the heat
because they tend to have long-term contracts for memory that last three months to a year.
But at the end of the day, Apple gets hit pretty hard by this.
They won't really adjust until the next iPhone release.
But that's the high end of the market, which is only a few hundred million phones a year.
Apple sells two or three hundred million
phones annually.
The bulk of the market is mid-range and low-end. It used to be that 1.4 billion
smartphones were sold a year. Now we're at about 1.1 billion.
Our projections are that we might drop to 800 million this year, and
down to 500 or 600 million next year. We look at data points out of China
from some of our analysts in Asia, Singapore, Hong Kong, and Taiwan.
They've been tracking this, and they see Xiaomi and Oppo cutting low-end
and mid-range smartphone volumes by half. Yes, it’s only a $150 BOM increase on a $1,000
iPhone where Apple has some larger margin. But for smaller phones, the percentage of the BOM
that goes to memory and storage is much larger. And the margins are lower, so there's
less capacity to even eat the margins. And they have also generally tended not
to do long-term agreements on memory. Why this is a big deal is that if smartphone
volumes halve, that drop will happen in
the low and mid-range, not the high end.
So it’s not like the bits released are halving. Currently, consumer devices account
for more than half of memory demand. Even if you halve smartphone volumes,
because of the shape of the halving, the low end gets cut by more than half, while
the high end gets cut by less than half, because you and I will still buy the high-end
phones that cost north of a thousand dollars. We'll buy them even if they get
a little bit more expensive. And Apple's volumes will not go down as
much as a low-end smartphone provider. The same applies to PCs. What this
does to the market is quite drastic. DRAM gets released and goes to AI chips, who are
willing to do longer-term contracts and pay higher margins, because at the end of the day the margin
they extract from the end user is much larger. This probably leads to people hating AI even more.
Today, you already see all the memes on PC subreddits and gaming PC Twitter.
It's cat dancing videos saying,
"This is why memory prices have doubled and
you can't get a new gaming GPU or desktop." It's going to be even worse when memory
prices double again, especially DRAM. Another interesting dynamic is that
it's not just DRAM, it's also NAND. NAND is also going up in price.
Both of these markets have expanded capacity very slowly over the last few years, NAND almost zero.
The percentage of NAND that goes to phones and PCs is larger than the percentage
of DRAM that goes to phones and PCs. As you destroy demand, mostly for
DRAM purposes, you unlock more NAND that gets allocated and can go to other markets.
The price increases of DRAM will be larger than those of NAND because you've released
more from the consumer, and in fact, you've produced more memory for AI.
Sorry, maybe you just explained it and I missed it.
Is it because SSDs are being used in large quantities for data centers?
They are, but not in as large quantities as DRAM. Okay, so they will also increase because
they'll be using some quantity, but there's
not as much of a need as there is for HBM. Makes
sense. One thing I didn't appreciate until I was reading some of your newsletters is that the
same constraints preventing logic scaling over the next few years are quite similar to what's
preventing us from producing more memory wafers. In fact, literally the same exact machine,
this EUV tool, is needed for memory. So I guess the question someone could ask right
now is, why can't we just make more memory? The constraints, as I was mentioning earlier,
are not necessarily EUV tools today or next year. They become that as we get to
the latter part of the decade. Currently, the constraints are more that
they physically just haven't built fabs. Over the last three to four years,
these vendors have not built new fabs because memory prices were really low.
Their margins were low, and in fact, they were losing money in 2023 on memory.
So they decided they weren't building new fabs.
The market slowly recovered over time but
never really got amazing until last year. In 2024, we were banging on the drums
that reasoning means long context, which means a large KV cache, which
means you need a lot of memory demand. We've been talking about that
for a year and a half, two years. People who understand AI went
really long on memory then. So you’ve seen that dynamic, but now
it has finally played out in pricing. It took so long for what was
obvious: long context means the KV cache gets bigger, you need more memory.
Half the cost of accelerators is memory. Of course they're going to
start going crazy on it. It took a year for that to
actually reflect in memory prices. Once memory prices reflected that, it
took another three to six months for the memory vendors to start building fabs.
Those fabs take two years to build. So we won't have really meaningful fabs to even
put these tools in until late 2027 or 2028. Instead, you've seen some really
crazy stuff to get capacity.
Micron bought a fab from a company in
Taiwan that makes lagging-edge chips. Hynix and Samsung are doing some pretty
crazy things to try and expand capacity at their existing fabs, which also have
large knock-on effects in the economy. So why can't we build more capacity?
There's nowhere to put the tools. It's not just EUV; there are other
tools involved in DRAM and logic. In logic, for N3, about 28% of the cost of the final wafer is EUV.
When you look at DRAM, it's in the teens. It's going up, but it's a much
smaller percentage of the cost. These other tools are also bottlenecks, although
their supply chains are not as complex as ASML's. You see Applied Materials, Lam
Research, and all these other companies expanding capacity a lot as well.
But you don't have anywhere to put the tool, because the most complex buildings people make
are fabs, and fabs take two years to build.
I interviewed Elon recently, and his whole plan
is that they're going to build this TeraFab and they're going to build the clean rooms.
I won't even ask you about the dirty rooms thing, but let's say they build the clean rooms.
I have a couple of questions. One, do you think this is the kind of
thing that Elon Co. could build much faster than people conventionally build it?
This is not about building the end tools. This is just about building the facility itself.
How complicated is it to just build the clean room extremely fast?
Is this something that Elon, with his "move fast" approach, could do much faster if that's
what we're bottlenecked on this year or next year? Two, does that even matter if, in two years,
your view is that we're not bottlenecked on clean room space, but on the tooling?
As with any complex supply chain, it takes time, and constraints shift over time.
Even if something is no longer a constraint, that doesn't mean that market no longer has margin.
For example, energy will not be a big bottleneck a couple of years from now, but that
doesn't mean energy isn't growing super
fast and there's no margin there.
It's just not the key bottleneck. In the space of fabs, clean rooms are the
biggest bottleneck this year and next year. As we get to 2028, 2029, 2030, there
will still be constraints there. The thing about Elon is he has a tremendous
capability to garner physical resources and really smart people to build things.
The way he recruits amazing people is by trying to build the craziest stuff.
In the case of AI, that hasn't really worked because everyone's trying to build AGI. Everyone
is very ambitious. But in the case of going to Mars, making rockets that land themselves, fully
autonomous electric cars, or humanoid robots, these are methods of recruiting the people who
think that's the most important problem in the world to work on that problem, because
he's the only one trying really hard. In the case of semiconductors, he stated he wants
to make a fab that's a million wafers per month. No one has a fab that big.
It's possible that he's able to recruit a lot of really awesome people and get them on this
crazy task of building a million wafers a month.
Step one is to build the clean room,
and that I think he probably can do. His mindset around deleting things, that it
can be dirty, it's fine, is probably not right. Actually I think it’s 100% not right.
You need the fab to be very clean. All of the air in the fab gets replaced
every three seconds, it’s that fast. There have to be so few particles.
But I think he can build the clean room. It'll take a year or two.
Initially, it won't be super fast, but over time, he'll get faster at it.
The really complex part is actually developing a process technology and building wafers.
I don't think he can develop that quickly. That has a lot of built-up knowledge.
The most complicated integration of very expensive tools and supply chains
is done by TSMC, Intel, or Samsung. These two other companies aren't even that
great at it, and they're tremendously complex. How surprised would you be if in 2030
there just happened to be some total
disruption where we're not using EUV?
What if we're using something that has much better effects, is much simpler to produce,
and can be produced in much bigger quantities? I'm sure as an industry insider that
sounds like a totally naive question, but do you see what I'm asking?
What probability should we put on something coming totally out of left
field to make all of this irrelevant? Something that's very simple and easy to
scale, I assign a very, very low probability. There are a number of companies
working on effectively particle accelerators or synchrotrons that generate
light that's either 13.5 nanometer, like EUV, or an even narrower wavelength, like X-ray at
7 nanometers, to then use in lithography tools. But those things are massive particle
accelerators generating this light. It's a very complicated thing to build.
There are a couple of companies and I think that could be a big disruption
to the industry beyond EUV. But I don't think we're going to
magically build something new that is direct write and super simple, and can
be manufactured at huge volumes, although
there are some attempts to do things like this.
I ask because if you think about Elon's companies in the past, rocketry was this thing that was
thought to be—and is—incredibly complicated. Look, I'm just a naive yapper compared to Elon.
What have I built? So maybe it's possible. In order to build more memory in the
future, could we build 3D DRAM the way we do 3D NAND and then go back to DUV?
That is the hope currently. Everyone's roadmap for 3D DRAM is that you'll still use EUV
because you want to have that tighter overlay. When you're doing these subsequent processing
steps, everything is vertically stacked and you have more layers on top of each other.
You want the pitches to be tighter. So generally, people are still
trying to do it with EUV. But what 3D would do is change the calculation
of how many bits a single EUV pass can make. That number would go up drastically if you
go to 3D DRAM. That is the hope. Right now, everyone's roadmap goes from the current 6F cell,
to a 4F cell, and then finally 3D DRAM by the end
of the decade or early next decade.
There's still a lot of R&D, manufacturing, and integration to be done.
I wouldn't call that out of the cards. I think it's very likely going to happen.
It's also going to require a huge retooling of fabs.
The breakdown of tools in a fab will be very different.
The lithography tool is actually the only thing that isn't that different.
But the number of them relative to different types of chemical vapor deposition, atomic layer
deposition, dry etch, or different kinds of etch chambers with different chemistries… You have all
these different tools for different process nodes. You can't just convert a logic fab to a
DRAM fab, or vice versa, or a NAND fab to a DRAM fab, in a short amount of time.
In the same way, existing DRAM fabs require a lot of retooling just to go from 1-alpha to 1-beta
to 1-gamma process nodes, because they have to add DUV and change the chemistry stacks for when
you’re using EUV in terms of deposition and etch. And the EUV tool has to be there.
Furthermore, when you change to 3D DRAM, there's going to be an even larger shift, so a
lot of retooling of these fabs needs to happen.
That would be a big disruption.
That would make EUV demand generally lower. But as we've seen across time, lithography demand
as a percentage of wafer cost has trended up. Around the 2014 era, it was 17% of the wafer cost,
and it's gone to 30% over the last fifteen years. For DRAM, it was in the low to mid-teens,
and now it's trended toward the high teens. Before we get to 3D DRAM, it'll
likely cross into the 20% range. But then, if we get to 3D DRAM, the total end
wafer cost as a percentage of EUV tanks again. I guess you care less about the percent of cost
and more about how much it bottlenecks production. Right, but the percentage of cost—
It’s a proxy, yeah. If you're Jensen or Sam Altman, or whoever stands to
gain a lot from scaling up AI compute, there are these stories that they'd go to
TSMC and say, "Why can't we access Y and Z?" But I think the point you're
making is that it doesn't really
matter what TSMC does in some sense.
In fact, even if you have Intel and Samsung building more foundries, in the
long run, you're going to be bottlenecked by ASML and other tool and material makers.
First, is that a correct interpretation? Second, should Silicon Valley people be
going to the Netherlands right now to try to pitch ASML to make more tools so that
in 2030 they can have more AI compute? It's a funny dynamic we saw
in 2023, 2024, and 2025. People who saw the energy bottleneck
before others asymmetrically went to Siemens, Mitsubishi, and of course GE
Vernova, and bought up turbine capacity. Now they're able to charge
excess amounts for deploying these turbines in places because of energy.
In the same sense, this could be done for EUV, except ASML is not just going to trust any
random bozo who wants to buy EUV tools. These turbines are much cheaper than EUV
tools, and there's many more of them produced. Especially once you get to industrial gas
turbines, not just combined-cycle but the cheaper,
smaller, less efficient ones, people put
down deposits for these. Someone could do this. Someone should go to the Netherlands
and be like, "I'll pay you a billion dollars. You give me the right to purchase ten EUV tools
two years from now, and I'm first in line." Then over those two years, you go around
and wait for everyone to realize, "Oh crap, I don't have enough EUV tools," and you
try to sell your option at some premium. All you're effectively doing
is saying, "ASML, you're dumb. You weren't making enough margin on these.
I'm going to make a margin." The question is, will ASML even
agree to this? I don't think so. There's a world where they at least get the
demand signal from that to increase production. Potentially. I agree.
But it sounds like you're saying they couldn't even increase production
if they wanted to, given the supply chain. Right. But that's exactly the market in
which… If they can't increase production, just like TSMC cannot increase production
that fast, and yet demand is mooning, then the obvious solution is to arbitrage this.
You and I know demand is way higher than they're
projecting and their capability to build.
You arbitrage this by locking up the capacity, doing a forward contract, and then
trying to sell it at a later date once other people realize everything is
fucked and we don't have enough capacity. Then you'll have this insane margin that
ASML and TSMC should have been charging. But the thing is, I don't know if
ASML and TSMC will ever agree to this. Let me ask you about power now.
It sounds like you think power can be arbitrarily scaled.
Not arbitrarily, but yes. But beyond these numbers. If I'm
remembering correctly, your blog post on how AI labs are increasing power implied that
GE Vernova, Mitsubishi, and Siemens could produce 60 gigawatts a year in gas turbines.
Then there are these other sources, but they're less significant than the turbines.
Only a fraction of that goes to AI, I assume. If in 2030 we have enough logic and memory to
do 200 gigawatts a year, do you just think that
these things are on a path to ramp up to more
than 200 gigawatts a year, or what do you see? Right now we're at 20 or 30.
This is critical IT capacity, by the way, which is an important thing to mention.
When I'm talking about these gigawatts, I'm talking about critical IT capacity.
Server plugged in, that's how much power it pulls. But there are losses along the chain.
There is loss on transmission, conversion, cooling, et cetera.
So you should gross this factor up from 20 gigawatts for this year, or 200
gigawatts by the end of the decade, to some number 20-30% higher. Then you have capacity
factors. Turbines don't run at 100 percent. If you look at PJM, which I think is the largest
grid in America—covering the Midwest and some of the Northeast area—in their models they want
to have roughly 20 percent excess capacity. Within that 20 percent excess capacity,
they're running all the turbines at 90%
because they are derated some for
reliability, maintenance, and so on. In reality, the nameplate capacity for energy is
always way higher than the actual end critical IT capacity because of all these factors. But it's
not just turbines. If you were just making power from turbines, that's simple, boring, and easy.
Humans and capitalism are far more effective. The whole point of that blog was that, yes, there
are only three people making combined-cycle gas turbines, but there's so much more we can
do. We can do aeroderivatives. We can take airplane engines and turn them into turbines.
There are even new entrants in the market, like Boom Supersonic trying to
do that and working with Crusoe. Also there's all the other ones like
that already exist in the market. There are also medium-speed
reciprocating engines: engines that spin in circles, like a diesel engine.
There are ten people who make engines that way. I'm from Georgia, and people used
to be like, "Oh man, you got a Cummins engine in there," regarding RAM trucks.
Automobile manufacturing is going down, so these
companies all have capacity and could scale
and convert that for data center power. You stick all these reciprocating engines in.
It's not as clean as combined-cycle, but maybe you can convert them from diesel to gas if you want.
What about ship engines? All of these engines for massive cargo ships are great.
Nebius is doing that for a Microsoft data center in New Jersey.
They're running ship engines to generate power. Bloom Energy is doing fuel cells.
We've been very positive on them for a year and a half now because they have such
a capability to increase their production. Their payback period for a production
increase is very fast, even if the cost is a little bit higher than combined-cycle,
which is the best for cost and efficiency. Then there's solar plus battery, which can come
online as those cost curves continue to come down. There's wind, where you might only expect 15
percent of the maximum power because things oscillate, but you add batteries. There are
all these things. The other thing is that the
grid is scaled so we don't cut off power at
peak usage on the hottest day of the summer. But in reality, that's a load spike
that is 10-20% higher than the average. If you just put enough utility-scale
batteries, or peaker plants that only run a small portion of the year—and those could
be gas, industrial gas turbines, combined-cycle, batteries, or any of the other sources
I mentioned—then all of a sudden you've unlocked 20% of the US grid for data centers.
Most of the time that capacity is sitting idle. It's really only there for that peak, which is
just a few hours over a few days of the year. If you have enough capacity
to absorb that peak load, then all of the sudden you’ve transferred it all.
Today, data centers are only 3-4% of the power of the US grid, and by 2028 they'll be 10%.
But if you can unlock 20% of the US grid like this, it's not that crazy.
The US grid is terawatt-level,
not hundreds-of-gigawatts-level.
So we can add a lot more energy. I'm not saying it's easy. These things are going to be hard.
There's a lot of hard engineering, risks people have to take, and new
technologies people have to use. But Elon was the first to do this behind-the-meter
gas, and since then we've seen an explosion of different things people are doing to get power.
They're not easy, but people are gonna be able to do them.
The supply chains are just way simpler than chips. Interesting. He made the point during the
interview that for the specific blade for the specific turbine he was looking at, the lead
times go out beyond 2030. Your point is that— That's great. There are so many other ways to
make energy. Just be inefficient. It's fine. Right now, combined-cycle gas turbines
have CapEx of $1,500 per kilowatt. Are you saying it would make sense to
have either technologies that are much more expensive than that, or other things are
getting cheap enough to make it competitive? Exactly. It can be as high as $3,500 per kilowatt.
It could be twice as much as the cost of
combined-cycle, and the total cost of the GPU on
a TCO basis has only gone up a few cents per hour. Because we've been talking about Hopper pricing,
$1.40, let's say the power price doubles. The Hopper that was $1.40 is now $1.50 in cost.
I don't care, because the models are improving so fast that the marginal utility of them is worth
way more than that ten-cent increase in energy. So you're saying 20 percent of the grid—the grid
is about one terawatt—can just come online from utility-scale batteries, increasing what
you'd be comfortable putting on the grid. The regulatory mechanism
there is not easy, by the way. But that's 200 gigawatts, if
that hypothetically happens. Just from the different sources of gas generation
you mentioned—the different kinds of engines and turbines—combined, how many gigawatts
could they unlock by the end of the decade? We're tracking this in our data.
There are over 16 different manufacturers
of power-generating things just from gas alone.
Yes, there are only three turbine manufacturers for combined-cycle, but we're
tracking 16 different vendors, and we have all of their orders.
It turns out there are hundreds of gigawatts of orders to various data centers.
As we get to the end of the decade, we think something like half of the capacity
that's being added will be behind the meter. Behind the meter is almost always more expensive
than grid-connected, but there are just a lot of problems with getting grid-connected: permits and
interconnection queues and all this sort of stuff. So even though it's more expensive,
people are doing behind the meter. What they're doing behind the meter ranges widely.
It could be reciprocating engines, ship engines, or aeroderivatives.
It could be combined-cycle, although combined-cycle is not
that great for behind the meter. It could be Bloom Energy fuel
cells, or solar plus battery. It could be any of these things.
And you're saying any of these individually could do tens of gigawatts?
Any of these individually will do tens of
gigawatts, and as a whole, they
will do hundreds of gigawatts. Okay. So that alone should more than—
Electrician wages will probably double or triple again.
There are going to be a lot of new people entering that field, and a ton of people who make money,
but I don't see that as the main bottleneck. Right now in Abilene, at the 1.2-gigawatt data
center that Crusoe is building for OpenAI, I think they have 5,000 people
working there, or at peak they did. If you turn that into 100 gigawatts—and
I'm sure things will get more efficient over time—that would be 400,000 people
it would take to build 100 gigawatts. If you think about the US labor force, and
how many electricians there are and how many construction workers there are… I
guess there are 800,000 electricians. I don't know if they're all
substitutable in this way. There are millions of construction workers.
But if we're in a world where we're adding 200 gigawatts a year, are we going to be
crunched on labor eventually, or do you
think that is actually not a real constraint?
Labor is a big constraint. It's a humongous constraint in this. People have to
be trained. Likewise, we'll probably start importing the highest-skilled labor.
It makes sense that a really high-skilled electrician in Europe who was working
on destroying power plants now comes to America and is building high-voltage
electricity moving across a data center. Humanoid robots or robotics at least might start
to help, but the main factor for reducing the number of people is going to be modularizing
things and making them in factories in Asia. Unfortunately for America, places like Korea,
Southeast Asia, and in many ways China as well are going to ship more and more built-out sections
of the data center and those will be shipped in. Today you currently ship servers or a rack in,
and then you plug that into different pieces that
you're shipping from different places.
But now you'll ship it to a factory and integrate the entire thing.
Maybe this is a two-megawatt block, and this block goes from high-voltage AC
power to the DC voltage that you deliver to the rack, or something like this.
Or with cooling, you ship a fully integrated unit that has a lot of the
cooling subsystems already put together, because plumbers are also a big constraint here.
Furthermore, instead of just a single rack where you have people wiring up all these racks with
electricity, you take a skid and put an entire row of servers on it that is
shipped directly from the factories. Today, a single rack may be 120 or 140 kilowatts,
but as we get to next-generation Nvidia Kyber and things like that, it's almost a megawatt.
In addition, if you do an entire row, it'll have the rack, the networking, the
cooling, and the power all integrated together.
Now when you come in, you have much less to cable.
There's less networking fiber, fewer power connections, and fewer plumbing things.
This can drastically reduce the number of people working in data centers, so our
capability to build them will be much larger. Along the way, some people will move faster
to new things, and some will move slower. Crusoe and Google have been talking
a lot about this modularization, as have companies like Meta and many others.
The people who move faster to new things may face delays, while the people who
are slower will face labor problems. There will always be dislocations in the market
because this is a very complex supply chain. At the end of the day, it's still
simple enough that we will be able to solve it through capitalism and human
ingenuity on the timescales required. Speaking of big problems to solve, Elon
Musk is very bullish on space GPUs.
If you're right that power is not a constraint
on Earth… I guess the other reason they would make sense is that even if there will be
enough gas turbines or whatever on Earth, Elon's next argument is that you can't get the
permitting to build hundreds of gigawatts on Earth. Do you buy that argument?
Land-wise, America is big. Data centers don't actually take up that
much space, so you can solve that. Permitting-wise, air pollution permits are
a challenge, but the Trump administration made it much easier.
You go to Texas, and you can skip a lot of this red tape.
Elon had to deal with a lot of this complex stuff in Memphis, and then building a power
plant across the border for Colossus 1 and 2. But at the end of the day, there's a lot more
you can get away with in the middle of Texas. Given that Elon lives in Texas,
why didn't he just go to Texas? I think it was partially that they over-indexed
on grid power for a temporary period of time. That's just what they thought they needed more of.
Because they had an aluminum refinery connected to the grid there.
It was actually an idled appliance factory.
But I think they may have indexed more to
grid power, water access, and gas access. I think they bought that knowing the gas
line was right there and they were going to tap it. Same with water. It was a
whole host of different constraints. It was probably an area where
electricians were easier to find. At the end of the day, I'm not
exactly sure why they chose that site. I bet Elon would've chosen somewhere in
Texas if he could've gone back because of the regulatory challenges he faced.
Ultimately, permitting is a challenge, but America is a big place with 50
states, and things will get done. There are a lot of small jurisdictions where
you can just transport in all the workers you need for a temporary period of three to
twelve months, depending on the contractor. You can put them in temporary housing and pay out
the butt, because labor is very cheap relative to the GPUs and the networking, and the end
value of the tokens it's going to produce.
So there is plenty of room to
pay for all of these things. Also, people are also diversifying now.
Australia, Malaysia, Indonesia, and India are all places where data centers
are going up at a much faster pace. But currently, over 70% of AI
data centers are still in America, and that continues to be the trend.
People are figuring out how to build these things. Ultimately, dealing with permitting and
red tape in middle-of-nowhere Texas, Wyoming, or New Mexico is probably a hell of
a lot easier than sending stuff into space. Other than the economic argument making less sense
once you consider that energy is a small fraction of the total cost of ownership of a data center,
what are the other reasons you're skeptical? Obviously, power is basically free in space.
That's the reason to do it. Yeah, that's the reason to do it.
But there are all the other counterarguments. Even if power costs double on Earth, it's
still a fraction of the total cost of the GPU.
The main challenge is… We have
ClusterMAX, which rates all the neoclouds. We test over 40 cloud companies,
including the hyperscalers and neoclouds. Outside of software, what differentiates these
clouds the most is their ability to deploy and manage failure. GPUs are horrendously unreliable.
Even today, around 15% of Blackwells that get deployed have to be RMA'd.
You have to take them out. Sometimes you just have to plug them
back in, but sometimes you have to take them out and ship them back to Nvidia or
their partners who do the RMAs and such. What do you make of Elon's argument that after an
initial phase, they actually don't fail that much? Sure, but now you've done this, tested them all,
deconstructed them, put them on a spaceship, launched them into space, and then put
them online again. That takes months. If your argument is that a GPU has a useful life of
five years, and this takes six additional months,
that is 10% of your cluster's useful life.
Because we're so capacity-constrained, that compute is theoretically most valuable
in the first six months you have it. We're more constrained now
than we will be in the future. That compute can contribute to a better
model in the future, or generate revenue today that you can use to raise more money.
All these things make now the most important moment, but you've potentially delayed
your compute deployment by six months. What separates these cloud providers is… We
see some clouds taking six months to deploy GPUs right here on Earth.
We see clouds that take a lot less than six months.
So the question is, where does space get in there? I don't see how you could test them all on Earth,
deconstruct them, and ship them to space without it taking significantly longer than just leaving
them in the facility where you tested them. The question I wanted to ask is about
the topology of space communication. Right now, Starlink satellites talk to
each other at 100 gigabits per second. You could imagine that being much
higher with optical intersatellite
laser links optimized for this.
That actually ends up being quite close to InfiniBand bandwidth,
which is 400 gigabytes a second. But that's per GPU, not per rack. So multiply
that by 72. Also, that was Hopper. When you go to Blackwell and Rubin, that 2x's and 2x's again.
But how much compute is happening per… During inference, are the different scale-ups
still working together, or is inference just happening as a batch within a single scale-up?
A lot of models fit within one scale-up domain, but many times you split them
across multiple scale-up domains. As models become more and more sparse,
which is the general trend, you want to ping just a couple of experts per GPU.
If leading models today have hundreds, if not a thousand, of experts, then you'd want to
run this across hundreds or thousands of chips, even as we advance into the future.
So then you end up with the problem of
needing to connect all these satellites
together for communications as well. That would be tough. If there's a world where
you could do inference for a batch on a single scale-up, then maybe it's more plausible.
But if not, it's a different story. Networking these chips together
is a problem, and you can't just make the satellite infinitely large.
There are a lot of physics challenges to making a satellite really big.
That's why you need these interconnects between the satellites. Those
interconnects are more expensive. In a cluster, 15-20% of the cost is networking.
All of a sudden, you're using space lasers instead of simple lasers that are manufactured in
volumes of millions with pluggable transceivers. And those things are very unreliable as well,
more unreliable than the GPUs by the way. Across the life of a cluster, you have
to unplug and clean them all the time. You have to unplug and replug
them just for random reasons. These things are just not as reliable.
So you've got that problem as well. You've got a more expensive, complicated
space laser to communicate instead of this
pluggable optical transceiver that's
been produced in super high volume. So all in all, what does that
imply for space data centers? Space data centers effectively are
not limited by their energy advantage. They are limited by the same contended resource.
We can only make two hundred gigawatts of chips a year by the end of the decade.
What are we going to do to get that capacity? It doesn't matter if it's on land or in space.
It doesn’t really matter, because you can build that power.
Human capabilities and capacity could get to the period where we're adding a terawatt
a year globally of various types of power. At some point, we do cross the chasm where space
data centers make sense, but it's not this decade. It is much further out, once energy
constraints actually become a big bottleneck and land permitting becomes a much bigger
bottleneck as it subsumes more of the economy. And crucially, once chips
are no longer the bottleneck. Right now, chips are the biggest bottleneck.
You want them deployed and working on
AI the moment they're manufactured.
There are a lot of things people are doing to increase that speed faster and faster.
They’re modularizing data centers, or even modularizing racks where you put the chip in at
the data center, but only the chip and everything else is already wired up and ready to go.
There are things like this people are doing to decrease that time that you cannot do in space.
At the end of the day, all that matters in a chip-constrained world is getting
these chips producing tokens ASAP. Maybe by 2035, the semiconductor industry,
ASML, Zeiss, and suppliers like Lam Research and Applied Materials and other fab
manufacturers will catch up once the pendulum swings and we are able to make enough chips.
Then we will be optimizing every dial and it makes sense to optimize the 10-15% of energy costs.
As we move to ASICs potentially, and if Nvidia's margins aren't +70%, maybe
that energy cost becomes 30% of the cluster. These are the things to optimize.
But Elon doesn't win by doing 20% gains. He
never wins that way. Elon wins when he swings for
the fences and does 10X gains. That's what SpaceX is about. That's what Tesla is about. All of his
success has been about that, not chasing the 20%. I think space data centers will eventually
be a 10X gain as Earth's resources get more and more contentious, but that's not this decade.
Just to drive some intuition about how much land there is on Earth… Obviously, for the chips
themselves, especially if you move to a world where you have racks that have megawatts—
That's the other thing. If manufacturing is the constraint, right now it's roughly one
watt per square millimeter for AI chips. One easy way to improve that is to pump
it to two watts per square millimeter. You may not get 2x the performance,
you may only get 20% more performance, and that requires much more exotic cooling.
It requires more complicated cold plates and complex liquid cooling, or maybe
even things like immersion cooling. In space, higher watts per
millimeter is very difficult,
whereas on Earth, these are solved problems.
One of these things enables you to get a lot more tokens, maybe 20% more tokens per wafer
that's manufactured, and that's a humongous win. Square millimeter, you mean of die area?
Yeah, of die area. It would be better for space because more watts
per millimeter means the chip runs hotter. I guess this is a question of computer
chip engineering, but it cools to the fourth power by the Stefan-Boltzmann law.
If you can run a very hot chip, it allows a lot of—
No, you can't run it hotter. You can only run it denser.
The problem is that getting the heat out of that dense area means you have to
move away from standard air and liquid cooling to more exotic forms of liquid cooling, or even
immersion, to get to higher power densities. That's more difficult in
space than it is on Earth. Maybe it's worth explaining at this point what
exactly a scale-up is and what it looks like for Nvidia versus Trainium versus TPUs.
Earlier I was mentioning how
communication within a chip is super fast.
Communication within chips that are in the same rack is fast, but not as fast.
It's on the order of terabytes. Communication very far away is on
the order of hundreds of gigabytes. As you get further distance, maybe
across the country, the order of magnitude is on the order of gigabytes.
A scale-up domain is this tight domain where the chips are communicating
on the order of terabytes a second. For Nvidia, previously this meant
an H100 server had eight GPUs, and those eight GPUs could talk to
each other at terabytes a second. With Blackwell NVL72, they
implemented rack-scale scale-up. That meant all seventy-two GPUs in the rack could
connect to each other at terabytes a second. The speed doubled generation on generation, but
the most important innovation was going from eight to seventy-two in the domain.
When we look at Google, their scale-up domain is completely different.
It has always been on the order of thousands. With TPU v4, they had pods the
size of four thousand chips.
With v8 or v7, they have pods in
the eight or nine thousand range. What's relevant here is that it's not the
same as Nvidia. It's not like for like. Google has a topology that's a torus.
Every chip connects to six neighbors. Nvidia's 72 GPUs connect all-to-all.
They can send terabytes a second to any arbitrary other chip in that pod of scale-up.
Whereas Google, you have to bounce through chips. If TPU 1 needs to talk to TPU 76, it has to bounce
through various chips, and there is always some blocking of resources when you do that because
that one TPU is only connected to six other TPUs. So there is a difference
in topology and bandwidth, and there are trade-offs and advantages to both.
Google gets to have a massive scale-up domain, but they have the trade-off of bouncing
across chips to get from one to another. You can only talk to six direct neighbors.
Amazon has mutated their scale-up domain. They're somewhere in between Nvidia and Google.
They're trying to make larger scale-up domains.
They try to do all-to-all to some extent with
switches, which is what Nvidia does, but they also use torus topologies like Google to some extent.
As we advance forward to next generations, all three of them are moving more
towards a dragonfly topology. That means there are some fully connected elements
and some elements that are not fully connected. You can get the scale-up to be hundreds or
thousands of chips, but also have it not contend for resources when bouncing through chips.
Related question: I heard somebody make the claim that the reason parameter scaling has been
slow—and only now are we getting bigger models from OpenAI and Anthropic—is that… The original
GPT-4 is over a trillion parameters, and only now are models starting to approach that again.
I heard a theory that the reason is that Nvidia's scale-ups have just not
had that much memory capacity.
Let's say you have a 5T model running at
FP8, so that's five trillion gigabytes. And then you have the KV cache, let's say it's—
Just call it the same size. Okay, let's say it's the same size for one batch.
So you need ten terabytes to be able to run… A single forward pass, yeah.
And then only with the GB200 and NVL72 do you have an Nvidia scale-up that has twenty
terabytes, and before that they were much smaller. Whereas Google, on the other hand, has had
these huge TPU pods that are not all-to-all, but still have hundreds of terabytes
of capacity in a single scale-up. Does that explain why parameter
scaling has been slow? I think it's partially the capacity and
bandwidth, but also as you build a larger model, the ability to deploy it is slower.
In terms of what the inference speed is for the end user, that's kind of irrelevant. What's
really relevant is RL. What we've seen with these models and allocation of compute at a lab… There
are a few main ways you can allocate compute.
You can allocate it to inference, i.e. revenue.
You can allocate it to development, i.e. making the next model.
You can allocate it to research. In development specifically, you
split it between pre-training and RL. When you think about what is happening, the
compute efficiency gains you get from research are so large that you actually want most of your
compute to go to research, not to development. All these researchers are generating new
ideas, trying them out, testing them, and continuing to push the Pareto optimal
curve of scaling laws further and further. Empirically, what we’ve seen is that
model costs get ten times cheaper every year, or even more than that.
At the same scale it gets ten times cheaper, and to reach new frontiers it
costs the same amount or more. So you don't want to allocate too
many resources to pre-training and RL. You actually want to allocate most
of your resources to research. In the middle is this development period.
If you pre-train a five-trillion-parameter model,
how many rollouts do you have to do in RL?
Rollouts for a five-trillion-parameter model are five times larger than for
a one-trillion-parameter model. If you wanted to do as many rollouts—maybe
the larger model is two times more sample efficient—now you need 2.5x as much
time of RL to get the model smarter. Or you could RL the smaller model for 2x the time.
You'd still have a 25% difference in the big model, which is 2x as sample efficient
and doing X number of rollouts. But the smaller model, which is a
trillion parameters, although its less sample efficient, is doing twice as
many rollouts and is still done faster. You get the model sooner, you've done more RL,
and then you can take that model to help you build the next models, help your engineers
train, and do all these research ideas. This feedback loop is actually weighed
towards smaller models in every case, no matter what your hardware is.
As you look to Google, they do deploy the largest production model of
any of the major labs with Gemini Pro.
It's a larger model than GPT-5.4.
It's a larger model than Opus. Google does this because they have a unipolar set
of compute. It's almost all TPU. Whereas Anthropic is dealing with H100s, H200s, Blackwell,
Trainiums, and TPUs of various generations. OpenAI is dealing with mostly Nvidia right now,
but going towards having AMD and Trainium as well. The fleets of compute like Google's can
just optimize around a larger model. They can leverage a thousand chips in a scale-up
domain to get the RL time speed much faster so that this feedback loop can be fast.
But at the end of the day, in isolation, you almost always want to go with a smaller
model that gets RL'd faster and gets deployed into research and development earlier.
You can build the next thing and get more efficiency wins.
You have this compounding effect of making a smaller model that can be
deployed into research and development earlier.
I spend less compute on the training because I
was able to allocate more compute to the research. This compounding effect of being able
to do research faster and faster is potentially a faster takeoff.
That's all these companies want: the fastest takeoff possible.
Okay, a spicy question. You've explained that SemiAnalysis sells these spreadsheets.
You're always pointing out how six months or a year ago, you warned
people about the memory crunch. Now you're telling people about the cleanroom
crunch, and in the future, the tool crunch. Why is Leopold the only person using your
spreadsheets to make outrageous money? What is everybody else doing?
I think there are a lot of people making money in many ways.
Leopold jokes that he's the only client of mine who tells me our numbers are too low.
Everyone else tells me our numbers are too high, almost ad nauseam.
Whether it's a hyperscaler saying, "Hey, that other hyperscaler, their numbers are
too high," and we're like, "Nah, that's it." They're like, "No, no, no, it's
impossible," blah, blah, blah. You finally have to convince them through all
these facts and data when we're working with
hyperscalers or AI labs that in fact, no,
that number isn't too high, that's correct. Eventually, sometimes it takes them
six months to realize, or a year later. Other clients, on the trading
side, also use our data. Roughly 60% of my business is industry.
So AI labs, data center companies, hyperscalers, semiconductor companies, the
whole supply chain across AI infrastructure. But 40% of our revenue is hedge funds.
I'm not going to comment on who our customers are, but a lot of people use the data.
It's just how do you interpret it, and then what do you view as beyond it?
I will say Leopold is pretty much the only person who tells me my numbers are too low, always.
Sometimes he's too high, sometimes I'm too low. But in general, I think
other people are doing that. You can look across the space at hedge funds and
look at their 13Fs and see they own, maybe not
exactly what Leopold does, because it's always a
question of what is the most constrained thing. What's the thing that's going to
be most outside of expectations? That's what you're really trying to
exploit: inefficiencies in the market. In a sense, our data is making the market
more efficient by making the base data of what's happening more accurate.
Many funds do trade on information that is out there… I don't
think Leopold's the only person. I think he has the most conviction
about the AGI takeoff, though. Right, but the bets are not
about what happens in 2035. The bets that you're making—that are at least
exemplified by public returns we can see for different funds including Leopold's—are
about what has happened in the last year. The last year stuff could be
predicted using your spreadsheets. It's about buying the next year's spreadsheets.
They're not just spreadsheets. There are reports. There's API access to
the data. There's a lot of data. But do you see what I mean?
It's not about some crazy singularity thing.
It's about, do you buy the memory crunch?
You only buy the memory crunch if you believe AI is going to take off in a huge way.
The memory crunch, a lot of it was predicated on… At least for people in the Bay Area who
think about infrastructure, it's obvious. KV cache explodes as context lengths get longer,
so you need more memory. Then you do the math. You also have to have a lot of supply chain
understanding of what fabs are being built, what data centers are being built,
how many chips, and all these things. We track all these different datasets
very tightly, but at the end of the day, it takes someone to fully believe
that this is going to happen. A year ago, if you told someone memory
prices would quadruple and smartphone volumes are going to go down 40% over the
year or two after that, people were like, "You're crazy. That'd never happen." Except a few
people do believe that, and those people did trade memory. And people did. I don't think Leopold
was the only person buying memory companies. He, of course, sized and positioned and did
things in better ways than some, maybe most.
I don't want to comment on whose returns
are what, but he certainly did well. Other people also did really well.
Wow, you've made me diplomatic for the first time ever. No, no, you're fine.
I think this is hilarious. I'm being a diplomat, whereas usually I'm spicy.
Okay, some rapidfire to close out. If you're saying with the memory,
logic, et cetera, the N3 is mostly going to be AI accelerators, but then there's
N2, which is mostly Apple now… In the future, I guess AI would also want to go on N2.
Can TSMC kick out Apple if Nvidia and Amazon and Google say, "Hey, we're willing
to pay a lot of money for N2 capacity?" I think the challenge with this is chip design
timelines take a long while, so that's more than a year out, and the designs that are
on two nanometer are more than a year out.
What would really happen is Nvidia and
all these others will be like, "Hey, we're going to prepay for the capacity
and you're going to expand it for us." Maybe TSMC takes a little
bit of margin, but not a ton. They're not going to kick Apple out entirely.
What they're going to do is when Apple orders X, they might say, "Hey, we project you only need
X minus one, and so that's what we're going to give you, X minus one."
Then that flex capacity, Apple's kind of screwed on.
Traditionally, Apple has always over-ordered by 10% and cut back
by 10% over the course of the year. Some years they hit the entire 10%.
Volumes vary based on the season and macro. I don't think TSMC would kick out Apple.
I think Apple will become a smaller and smaller percentage of TSMC's revenue, and therefore be
less relevant for TSMC to cater to their demands. TSMC could eventually start saying, "Hey, you've
got to pre-book your capacity for next year, for two years out, and you have to prepay for
the CapEx," because that's what Nvidia and Amazon and Google are doing.
I wonder if it's worth going into specific numbers.
I don't have any of them on hand.
What percentage of N2 does Apple have its
hands on over the coming years versus AI? This year Apple has the majority of
N2 that's going to get fabricated. There's a little bit from AMD.
They are trying to make some AI chips and CPU chips early.
There's a little bit, but for the most part, it's Apple.
As we go forward to the year after that, Apple still gets closer to half of it as other people
start ramping, but then it falls drastically, just like for N3, where they were half.
When I say N2, that includes A16, which is a variant of N2.
Over time, those nodes will be the majority. What's also interesting is traditionally,
Apple has been the first to a process node. 2 nm is actually the first time they're
not. Well, that’s besides Huawei. Huawei, back in 2020 and before, was the first with
Apple, but they were both making smartphones. Now, with 2 nm, you've got AMD trying
to make a CPU and a GPU chiplet that they use advanced packaging to package
together, in the same timeframe as Apple.
This is a big risk for AMD that causes
potential delays because it's a brand-new process technology. It's hard. But at the end
of the day, this is a bet that they want to do to scale faster than Nvidia and try and beat them.
As we move forward, when we move to the A16 node, the first customer there is not even
Apple. It's AI. As we move forward, that will become more and more prevalent.
Not only will Apple not be the first to a node, they will also not be the majority
of the volume to the new node. They'll then just be like any old customer.
Because the scale of TSMC's CapEx keeps ballooning, but Apple's business
is not growing at the same pace, they become a less and less relevant customer.
They also will just cut their orders because things in the supply chain are
kicking them out, whether it be packaging or materials or DRAM or NAND.
These things are increasing in cost. They can't pass on all the cost to customers
likely because the consumer is not that strong. You end up with this conundrum
where they are just not TSMC's best bud like they have been historically.
Do you think if Huawei had access to 3 nm,
they would have a better accelerator than Rubin?
Potentially, yeah. Huawei was the first with a 7 nm AI chip as well.
They were the first with a 5 nm mobile chip, but they were the first with a 7 nm AI chip.
The Huawei Ascend was two months before the TPU and four months before Nvidia's A100, I think.
That's just moving to a process node. That doesn't imply software or hardware
design or all these other things. But Huawei is arguably the only company in the
world that has all the legs. Huawei has cracked software engineers. Huawei has cracked
networking technologies. That's, in fact, their biggest business historically. They have
cracked AI talent. Furthermore, beyond Nvidia, they actually have better AI researchers.
Beyond Nvidia, they have their own fabs. And beyond Nvidia, they have their own end
market of selling tokens and things like that. Huawei is able to get the top, top talent.
Nvidia is as well, but not with as much
concentration, and Huawei
has a bigger pool in China. It's very arguable that Huawei, if they
had TSMC, would be better than Nvidia. There are areas where China has advantages
in areas that Nvidia can't access as easily. Not just scale, but certain optical
technologies China's actually really good at. I think it's very reasonable that if in
2019 Huawei was not banned from using TSMC, Huawei would have already eclipsed
Apple as the biggest TSMC customer. Huawei has huge share in networking,
compute, CPUs, and all these things. They would have kept gaining share, and
they'd likely be TSMC's biggest customer. Wow. That's crazy. I've got a
random final question for you. The other part of the Elon interview was robots.
If humanoids take off faster than people expect, if by 2030 there's millions of humanoids
running around which each need local compute,
any thoughts on what that implies?
What would be required for that? There's a lot of difficulties with the VLMs
and VLAs that people are deploying on robots. But to some extent, you don't need to
have all the intelligence in the robot. It would be much more efficient to not do that.
Because in the cloud, you can batch process and all these things.
What you may want to do is have a lot of the planning and longer-horizon tasks
determined by a much more capable model in the cloud that runs at very high batch sizes.
Then it pushes those directions to the robots, who interpolate between each subsequent action.
Or it is given a command like, "Hey, pick up that cup," and then the model
on the robot can pick up the cup. As it's picking up, things like weight and
force may have to be determined by the model on the robot, but not everything needs to be.
It can say, "hey that’s a headphone" and the
super model in the cloud can say, "I
know these headphones are Sony XM6s," which is not a Dwarkesh ad spot, but...
I’m like, why is this guy's plugging this thing so hard. It's on the table. It's on his
neck when we're interviewing Satya together. Is he getting paid by Sony?
Unfortunately not. But anyways, it might say, "Hey, the headband is soft, and
this is the weight of it," and all these things. Then the model on the robot
can be less intelligent, take these inputs, and do the actions.
It may get told by the model in the cloud every second, or maybe ten times a second,
depending on the hertz of the action. But a lot of that can be offloaded to the cloud.
Otherwise, if you do all of the processing on the device, I believe it would be more
expensive because you can't batch. Two, you couldn't have as much intelligence
as you do in the cloud because the models will just be bigger in the cloud.
Three, we're in a semiconductor shortage world, and any robot you deploy needs leading-edge
chips because the power is really bad for robots. You need it to be low power and efficient,
and all of a sudden you're taking power
and chips that would've been for AI data
centers, and you're putting them in robots. So now that 200 gigawatts gets lower if
you're deploying millions of humanoids. I think this is very interesting because
something people might not appreciate about the future is how centralized, in
a physical sense, intelligence will be. Right now, there are eight billion humans, and
their compute is in their heads, on their person. In the future, even with robots that are
out physically in the world—obviously, knowledge work will be done in a centralized
way from data centers with hundreds of thousands or maybe millions of instances—the future
you're suggesting is one where there's more centralized thinking and centralized computation
driving millions of robots out in the world. That's an interesting fact about the future
that I think people might not appreciate. I think Elon recognizes this, which is why
he's going to different places for his chips. He signed this massive deal with Samsung to make
his robot chips in Texas because I personally
think he thinks Taiwan risk is huge.
Because of that and the centralization of resources in Taiwan, having his robot
chips in Texas means having a separate supply chain that is not as constrained.
No one's really making AI chips on Samsung besides Nvidia's new LPU that they launched.
They’re launching it next week, but we're recording this the week before.
This episode's coming out Friday. Oh, this episode's coming out before.
Sick. They're launching this new AI chip next week which is built on Samsung, but
that's a recent development from Nvidia. That's the only other AI demand there,
whereas on TSMC, everything is competing. He gets both geopolitical diversification
and supply chain diversity for his robots, and he's not competing as much with the infinite
willingness to pay for the data center geniuses. Final question, on Taiwan. If we believe
that tools are the ultimate bottleneck, how much of Taiwan's place in the AI semiconductor
supply chain could we de-risk simply by having a
plan to airlift every single process engineer
at TSMC out if they get blockaded or something? Or do you still need to ship out the EUV
tools, which would be multiple plane loads per single tool and would not be practical?
If you ship out all the process engineers and assuming it's hot enough that you destroy the
fabs, no one has all the fabs in Taiwan now, which is a big risk.
These tools actually use a lot of semiconductors which are manufactured in Taiwan.
It's a snake eating its own tail meme because you can't make the tools without the chips from
Taiwan, which you can't use without the tools in Taiwan. There's obviously some diversification
there. They don't use super advanced chips in lithography tools, but at the end of the
day, there is some dragon eating its tail. Just shipping out all the engineers and
blowing up the fabs means China has a stronger semiconductor supply chain than the
rest of the world in terms of verticalization, now that you've removed Taiwan.
You've got all the know-how, but you've got to replicate it in,
let's say, Arizona or wherever for TSMC.
It's going to take a long time to build all the
capacity that TSMC has built over the years. And so you've drastically
slowed US and global GDP. Not just growth, you've shrunk the GDP
massively, and you've got a lot bigger problems. Your incremental ability to add
compute goes to almost zero. Instead of hundreds of gigawatts
a year by the end of the decade, let's say something happens to Taiwan, now you're
at maybe 10 gigawatts across Intel and Samsung, or 20 gigawatts. It's nothing. Now all of a sudden
you've really caused some crazy dynamics in AI. Of course, you have all the existing capacity,
but that existing capacity pales in comparison to the capacity that's being expanded.
Okay. Dylan, that was excellent. Thank you so much for coming on the podcast.
Thank you for having me. And see you tonight.