NVIDIA has released its official MLPerf Inference v3.1 performance benchmarks running on the world's fastest AI GPUs such as Hopper H100, GH200 & L4.
Today, NVIDIA is releasing its first performance benchmarks within the MLPerf Inference v3.1 benchmark suite which covers a wide range of industry-standard benchmarks for AI use cases. These workloads range from Recommender, Natural Language Processing, Large Language Model, Speech Recognition, Image Classification, Medical Imaging, and Object Detection.
The two new sets of benchmarks include DLRM-DCNv2 and GPT-J 6B. The first is a larger multi-hot dataset representation of real recommenders which uses a new cross-layer algorithm to deliver better recommendations and has twice the parameter count versus the previous version. GPT-J on the other other is a small-scale LLM that has a base model that's open source and was released in 2021. This workload is designed for summarization tasks.
NVIDIA also showcases a conceptual real-life workload pipeline of an application that utilizes a range of AI models to achieve a required query or task. All of the models will be available on the NGC platform.
In terms of performance benchmarks, the NVIDIA H100 was tested across the entire MLPerf v3.1 Inference set (Offline) against competitors from Intel (HabanaLabs), Qualcomm (Cloud AI 100) and Google (TPUv5e). NVIDIA delivered leadership performance across all workloads.
To make things a little more interesting, the company states that these benchmarks were achieved about a month ago since MLPerf requires at least 1 month between the submission time for the final results to be published. Since then, NVIDIA has come up with a new technology known as TensorRT-LLM which further boosts performance
Read more on wccftech.com