AMD Instinct GPUs such as the MI250 have received a major boost in AI performance, bringing them closer to NVIDIA's chips.
In a blog post by MosaicML, the software vendor has shown how PyTorch 2.0 and ROCM 5.4 help increase the performance of AMD Data Center GPUs such as the Instinct series without any code changes. The software vendor offers enhanced support for ML and LLM training on a vast range of solutions from NVIDIA and AMD supporting 16-bit precision (FP16 / BF16). The recent releases have allowed MosaicML to unless even better performance out of the AMD Instinct accelerators with the use of their LLM Foundry Stack.
The highlights of the results were as follows:
While AMD's Instinct MI250 GPU offered a slight edge over the NVIDIA A100 GPUs in terms of FP16 FLOPs (without sparsity), memory capacity, and memory bandwidth, it should be noted that MI250 can only scale up to 4 accelerators whereas NVIDIA A100 GPUs can scale up to 8 GPUs on a singular system.
Taking a deeper look, both the AMD and NVIDIA hardware was able to launch AI training workloads with LLM foundry with ease. Performance was evaluated in two training workloads, first was overall throughput (Tokens/Sec/GPU) and the other was the overall performance (TFLOP/Sec/GPU).
The AI Training Throughput was done on a range of models from 1 Billion to 13 Billion parameters. Testing showed that the AMD Instinct MI250 delivered 80% of the performance of NVIDIA's A100 40GB and 73% performance of the 80GB variant. NVIDIA did retain its leadership position in all of the benchmarks but it should be mentioned that they also had twice as many GPUs running in the tests. Furthermore, it is mentioned that further improvements on the training side are expected for AMD Instinct
Read more on wccftech.com