NVIDIA's H100 GPUs are at the top of the spectrum when it comes to AI and the company has once again set new records in MLPerf benchmarks.
In the latest MLPerf benchmarks published by NVIDIA, the company highlights that they have established several new records, where the Eos supercomputer has completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes. This is a huge gain from the previous record, where the supercomputer managed to complete the same benchmark in 10.9 minutes, marking a whopping 3x uplift.
Now, the figures achieved by the supercomputer are indeed phenomenal, but what is the primary reason behind the achievement?
In simple words, NVIDIA's cutting-edge Hopper GPU architecture is coupled up with well-refined software resources. Eos supercomputer currently employs 10,752 NVIDIA H100 Tensor Core GPUs, which replaced the fairly older A100s, which is why the huge performance bump occurs in the first place. Moreover, through well-developed software resources such as NVIDIA's NeMo, which aids in LLM training, Team Green managed to squeeze out exceptional power from its platform.
Moreover, another record achievement by NVIDIA mentioned in the post is the advancements made within "system scaling", where through the help of various software optimizations, the company was successful in showing a 93% efficiency rate. The 10,752 H100 GPUs far surpassed the scaling in AI training in June, when NVIDIA used 3,584 Hopper GPUs. The importance of efficient scaling is immense in the industry, since achieving high computational power requires the use of more hardware resources, and without adequate software backing, the efficiency of the system is compromised
Read more on wccftech.com