Tensorwave has published the latest benchmarks of the AMD MI300X in LLM Inference AI workloads, offering 3x higher performance than NVIDIA H100.
AI Cloud provider, Tensorwave, has showcased the performance of AMD's MI300 accelerator within AI LLM Inference benchmarks against the NVIDIA H100. The company is one of the many who are offering cloud instances powered by AMD's latest Instinct accelerators and it looks like AMD might just have the lead, not only in performance but also value.
In a blog post, Tensorwave demonstrates how AMD's MI300 and MK1's accelerated AI engines and models are accelerating the landscape with faster & optimized performance across multiple LLMs (Large Language Models).
The company uses the Mixtral 8x7B model and conducted both online & offline tests on AMD & NVIDIA hardware. The test setup included 8 MI300X accelerators, each with a 192 GB memory pool, and 8 NVIDIA H100 SXM5 accelerators, each with an 80 GB memory pool. AMD's setup was running the latest ROCm 6.12 driver suite with the MK1 inference engine and ROCm AI optimizations for vLLM v0.4.0 while NVIDIA's setup was running the CUDA 12.2 driver stack (the latest is CUDA 12.5) with the vLLM v4.3 inference stack.
In terms of offline performance, the AMD MI300X AI accelerator showcased a performance uplift of 22%, all the way up to 194% (almost 3X) compared to the NVIDIA H100 across batch sizes that ranged from 1 to 1024. The MI300X accelerator was faster than the H100 across all batch sizes.
In online performance, Tensorwave designed a series of online tests to simulate realistic typical chat applications. The key metrics of interest are: