Intel released the November 2023 update to its MLPerf Training 3.1 results and delivered a 103% performance increase compared to their projection of 90% back in June. There are only three accelerators that are currently submitting GPT-3 results on MLPerf right now: Intel, NVIDIA and Google - making Intel's Gaudi 2 currently the only viable alternative to NVIDIA's GPUs (is that even the correct term anymore?) for MLPerf AI workloads.
Intel was also quick to point out that Xeon is the only CPU submitting training results on the MLPerf Benchmark as well. Without any further ado here are the slides presented:
As you can see, Intel's Gaudi team initially projected a 90% performance gain in FP8 - but were able to deliver a 103% gain in GPT-3 industry benchmark, decreasing their time to train in minutes (across 384 accelerators) from 311.94 minutes or 5.2 hours down to just over 2 hours or 153.58 minutes. Intel also presented several slides to aid in TCO (total cost of ownership) based decision making showcasing that the Gaudi 2 chip offers similar performance to the NVIDIA H100 while having a lower server cost - making it competitive in price/performance.
On GPTJ-99, Gaudi 2 shines even more - coming in just slightly behind NVIDIA's new Hopper chips. While the discussion back in June was about Gaudi 2 just being a viable alternative to NVIDIA's chips and significantly behind H100 (only trading blows with the older A100 model), now the Gaudi 2 chip is just slightly behind the H100 and GH200-96G setups. The H100 is just 9% faster while GH200-96G is just 12 % faster than Gaudi 2 in Server throughput benchmarks. This lead extends to 28% in offline benchmarks. Gaudi 2 outperformed A100 by close to 2x in both cases.
Lastly, Intel also
Read more on wccftech.com