Remember when we reported some market data indicating Meta had snapped up 150,000 of Nvidia's beastly H100 AI chips? Now it turns out Meta wants to have more than double that at 350,000 of the big silicon things by the end of the year.
Most estimates put the price of an Nvidia H100 GPU at between $20,000 and $40,000. Meta along with Microsoft are Nvidia's two biggest customers for H100, so it's safe to assume it will be paying less than most.
However, some basic arithmetic around Nvidia's reported AI revenues and its unit shipments of H100 reveals that Meta can't be paying an awful lot if any less than the lower end of that price range, which works out to roughly $7 billion worth of Nvidia AI chips over two years. For sure, then, at the very least we're talking about many billion of dollars being spent by Meta on Nvidia silicon.
Meta has also now revealed some details (via ComputerBase) about how the H100s are implemented. Apparently, they're built into 24,576-strong clusters used for training language models. At the lower end per-unit H100 price estimate, that works out at $480 million worth of H100s per cluster. Meta has provided insight into two slightly different versions of these clusters.
To quote ComputerBase, one «relies on Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) based on the Arista 7800, Wedge400 and Minipack2 OCP components, the other on Nvidia's Quantum-2 InfiniBand solution. Both clusters communicate via 400 Gbps fast interfaces.»
In all candour, we couldn't tell our Arista 7800 arse from our Quantum-2 InfiniBand elbow. So, all that doesn't mean much to us beyond the realisation that there's a lot more in hardware terms to knocking together some AI training hardware than just buying a bunch of GPUs. Hooking them up would appear to be a major technical challenge, too.
But it's the 350,000 figure that is the real eye opener. Meta was estimated to have bought 150,000 H100s in 2023 and hitting that 350,000 target means upping that
Read more on pcgamer.com