Nvidia H20 Restricted in China, The Huawei CloudMatrix 384, Whither Chip Controls – Stratechery by Ben Thompson

by oqtey
Nvidia GTC and ASICs, The Power Constraint, The Pareto Frontier – Stratechery by Ben Thompson

Good morning,

On Monday’s episode of Sharp Tech, we discussed Apple’s temporary reprieve from tariffs.

On to the Update:

Nvidia H20 Restricted in China

From Reuters:

Nvidia on Tuesday said it would take $5.5 billion in charges after the U.S. government limited exports of its H20 artificial intelligence chip to China, a key market for one of its most popular chips. Nvidia’s AI chips have been a key focus of U.S. export controls as U.S. officials have moved to keep the most advanced chips from being sold to China as the U.S. tries to keep ahead in the AI race. After those controls were implemented, Nvidia began designing chips that would come as close as possible to U.S. limits. Nvidia shares were down about 6% in after-hours trading.

This is Nvidia’s SEC filing:

On April 9, 2025, the U.S. government, or USG, informed NVIDIA Corporation, or the Company, that the USG requires a license for export to China (including Hong Kong and Macau) and D:5 countries, or to companies headquartered or with an ultimate parent therein, of the Company’s H20 integrated circuits and any other circuits achieving the H20’s memory bandwidth, interconnect bandwidth, or combination thereof. The USG indicated that the license requirement addresses the risk that the covered products may be used in, or diverted to, a supercomputer in China. On April 14, 2025, the USG informed the Company that the license requirement will be in effect for the indefinite future.

The Company’s first quarter of fiscal year 2026 ends on April 27, 2025. First quarter results are expected to include up to approximately $5.5 billion of charges associated with H20 products for inventory, purchase commitments, and related reserves.

Just for clarity’s sake, that $5.5 billion in charges is not money that Nvidia now needs to pay, but rather money that it has already paid or committed to pay to its various suppliers; normally Nvidia would recognize those costs when it sold the H20 cards, but now those sales are not going to happen, so it will need to write-off those costs, putting them directly onto its first quarter results.

What is interesting to consider is why those sales are not going to happen, and the reason is straightforward: the H20 is a pretty crappy AI accelerator. Compared to an H100, it has fewer cores, much lower memory bandwidth, no NVLink support (for linking multiple cards together), and is limited to the PCIe form factor (which means that all communication has to run through the CPU). Putting an H20 into a data center is a waste of a slot compared to an H100 or even an A100 (performance is similar to an A100, but there is that lack of scalability), much less a GB200, and slots and their associated power draw are the scarce resource when it comes to data centers. No one outside of China is going to want these accelerators, which is why they’re being written off.

The one positive feature of the H20 is that there is slightly more high-bandwidth memory than an H100, and significantly more than an A100; this speaks to the H20’s only real use case, which is inference on highly optimized models. It’s not useful for training at all given the lack of connectivity; in other words, I don’t buy the U.S. government’s concerns that H20s “may be used in, or diverted to, a supercomputer in China”. Previously allowed H800s are a much better bet in that regard, or smuggled chips (or the alternative I discuss below).

In fact, it’s precisely because the H20 is so underwhelming that this news is so bad for Nvidia. The takeaway from this ban is not that there was some great national security concern that was addressed by banning the H20, but rather that Nvidia is probably never going to be allowed to sell AI accelerators to China again (the company can still sell gaming chips, at least for now). This has long been a likely outcome, to be sure, but now investors know for certain.

The Huawei CloudMatrix 384

From the South China Morning Post:

China’s Huawei Technologies has launched a new artificial intelligence (AI) infrastructure architecture that is reportedly set to rival US chip giant Nvidia’s offerings in addressing computing power bottlenecks. Introduced last week, the CloudMatrix 384 Supernode was described as a “nuclear-level product” that matched Nvidia’s NVL72 system in alleviating computing bottlenecks for AI data centres, STAR Market Daily reported on Monday, citing unnamed sources from Huawei.

Nvidia’s NVL72, launched in March last year, features a 72-graphics processing unit (GPU) NVLink domain that functions as a single, powerful GPU, enabling real-time inference for trillion-parameter large language models (LLMs) at speeds 30 times faster than previous generations. NVLink is a high-speed interconnect technology developed by Nvidia that allows multiple GPUs to communicate with each other and share data more efficiently.

Huawei’s new supernode, currently deployed in the company’s data centres in Wuhu, a city in central Anhui province, achieved 300 petaflops of computing power, compared with the 180 petaflops offered by Nvidia’s NVL72, the report said, citing data from Huawei.

SemiAnalysis has a full report:

The CloudMatrix 384 consists of 384 Ascend 910C chips connected through an all-to-all topology. The tradeoff is simple: having five times as many Ascends more than offsets each GPU being only one-third the performance of an Nvidia Blackwell. A full CloudMatrix system can now deliver 300 PFLOPs of dense BF16 compute, almost double that of the GB200 NVL72. With more than 3.6x aggregate memory capacity and 2.1x more memory bandwidth, Huawei and China now have AI system capabilities that can beat Nvidia’s.

What’s more, is the CM384 is uniquely suited to China’s strengths, which is domestic networking production, infrastructure software to prevent network failures, and with further yield improvements, an ability to scale up to even larger domains. The drawback here is that it takes 3.9x the power of a GB200 NVL72, with 2.3x worse power per FLOP, 1.8x worse power per TB/s memory bandwidth, and 1.1x worse power per TB HBM memory capacity.

The deficiencies in power are relevant but not a limiting factor in China.

It’s that last sentence that is key. The CloudMatrix 384, which is a supercomputer consisting of 384 chips, takes up dramatically more space, uses dramatically more power, is almost certainly dramatically more expensive to make, and is dramatically more brittle than Nvidia’s NVL72; there is no one in the world who would choose the CloudMatrix 384 over the NVL72 if they had the choice.

That’s the thing, though: China doesn’t have the choice, because of U.S. export controls. The country’s saving grace is that they are operating on different constraints than the U.S. Go back to Nvidia CEO Jensen Huang’s GTC keynote last month:

Blackwell is way, way better than Hopper. And remember, this is not iso chips, this is iso power. This is ultimate Moore’s Law. This is what Moore’s Law was always about in the past and now here we are, 25x in one generation as iso power. This is not iso chips, it’s not iso transistors — iso power, the ultimate limiter. There’s only so much energy we can get into a data center.

When Huang says “iso power” he means that power is the constraint; the question is how many tokens can you generate given a fixed power envelope, and Blackwell can generate a lot more. These were the numbers Huang gave for a 1 MW data center:

What this means is that the first order of business for an ASIC maker is to match that level of efficiency. That, though, means keeping up with Nvidia’s product development cadence; The Information has a post about Amazon offering instances using its latest Trainium ASIC at a significant discount to the H100. Huang, though, spent this keynote insulting the H100! You can almost hear him speaking to Amazon directly: you spent all this money, and are offering these huge discounts, just to entice developers to lock themselves into an inferior chip, even as reasoning models are going to swamp your available power generation.

China — which actually builds power generation — is not iso power; they are iso compute: Huawei’s goal with the CloudMatrix 384 is to get as much compute as possible, power efficiency be damned.

Again, this comes with more tradeoffs than just power usage, given the challenges of scaling up inferior chips; I mentioned the brittleness and cost above, but there is also going to be a big software challenge. That, though, is the other piece of bad news for Nvidia: banning the company’s chips from China will weaken the CUDA moat. As I noted when the Biden Administration first placed limits on Nvidia in China, there are two problems this presents:

  • There is reason to create generalizable software that does what CUDA does, but that isn’t locked to Nvidia chips.
  • There is a motivation for the ecosystem to tear out any CUDA-specific code so that various frameworks can more easily run on Chinese infrastructure, which will also benefit non-Nvidia alternatives outside of China.

And, of course, in the long run there is the potential of Huawei selling outside of China.

Whither Chip Controls

To go back to SemiAnalysis, much of the CloudMatrix 384 is not made in China:

One common misconception is that Huawei’s 910C is made in China. It is entirely designed there, but China still relies heavily on foreign production. Whether it be HBM from Samsung, wafers from TSMC, or equipment from America, Netherlands, and Japan, there is a big reliance on foreign industry.

While SMIC, the largest foundry in China, does have 7nm, the vast majority of Ascend 910B and 910C are made with TSMC’s 7nm. In fact, the US Government, TechInsights, and others have acquired Ascend 910B and 910C and every single one used TSMC dies. Huawei was able to circumvent the sanctions on them against TSMC by purchasing ~$500 million of 7nm wafers through another company, Sophgo. TSMC is being fined $1 billion for this blatant sanctions violation, only 2x what they profited. It is rumored Huawei continues to receive wafers from TSMC via another 3rd party firm, but we cannot verify this rumor…

While TSMC has provided 2.9 million dies which is enough for 800 thousand Ascend 910B’s and 1.05 million Ascend 910C’s across 2024 and 2025, the SMIC production has the potential to massively grow the capacity if HBM, wafer fabrication tools, tool servicing, and chemicals such as photoresist are not effectively controlled.

My heterodox position on this matter is clear: for the same reason that I am against the chip controls (but in favor of banning the sale of semiconductor manufacturing equipment into China), I think that it is a good thing that the CloudMatrix 384 is dependent on TSMC and Samsung, because that is a reason for China to maintain the status quo with Taiwan in particular; SemiAnalysis is much more hawkish on chip controls than I am.

At the same time, the reality is that right now we are in the messy middle: we’re not actually stopping Huawei from building a system that is capable of doing large language model training (albeit inefficiently), but we are hurting the fortunes of a U.S. AI champion and limiting their long-term competitiveness. And, of course, we are incentivizing everyone in China, from the government to private enterprise, to ultimately remove the point of leverage that we can’t even wield properly.


This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery.

The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.

Thanks for being a subscriber, and have a great day!

Related Posts

Leave a Comment