AMD has announced the official launch of its flagship AI GPU accelerator, the MI300X, which offers up to 60% better performance than NVIDIA's H100.
The AMD Instinct MI300 class of AI accelerators will be another chiplet powerhouse, making use of advanced packaging technologies from TSMC. Today, AMD not only announced the launch of these chips but shared the first performance benchmarks of the MI300X which look great. AMD first used the general specs as a comparison and their CDNA 3 accelerator offers (versus NVIDIA H100):
In general LLM Kernel TFLOPs, the MI300X offers up to 20% higher performance in FlashAttention-2 and Llama 2 70B. Looking from a platform perspective which compares an 8x MI300X solution to an 8X H100 solution, we see a much bigger 40% gain in Llama 2 70B & a 60% gain in Bloom 176B.
AMD mentions that in training performance, the MI300X is on par with the competition (H100) and offers competitive price/perf while shining in inferencing workloads.
The driving force behind the latest MI300 accelerators is ROCm 6.0. The software stack has been updated to the latest version with powerful new features which include support for various AI workloads such as Generative AI and Large language models.
The new software stack supports the latest compute formats such as FP16, Bf16, and FP8 (including Sparsity). The optimizations combine to offer up to 2.6x speedup in vLLM through optimized inference libraries, 1.4x speedup in HIP Graph through optimized runtime, and 1.3x Flash Attention speedup through optimized Kernels. ROCm 6 is expected later this month alongside the MI300 AI accelerators. It will be interesting to see how ROCm 6 compares to the latest version of NVIDIA's CUDA stack which is its real competition.
The AMD
Read more on wccftech.com