NVIDIA has announced that TensorRT-LLM is coming to Windows soon and will bring a huge AI boost to PCs running RTX GPUs.
Back in September, NVIDIA announced its TensoRT-LLM model for Data Centers which offered an 8x gain on the industry's top AI GPUs such as the Hopper H100 and the Ampere A100. Taking full advantage of the tensor core acceleration featured on NVIDIA's GeForce RTX & RTX Pro GPUs, the latest model will deliver up to 4x faster performance in LLM Inferencing workloads.
Earlier, we explained that One of the biggest updates that TensorRT-LLM brings is in the form of a new scheduler known as In-Flight batching which allows work to enter & exit the GPU independent of other tasks. It allows dynamic processing of several smaller queries while processing large compute-intensive requests in the same GPU. The TensorRT-LLM makes use of optimized open-source models which allow for higher speedups when Batch Sizes are increased. Starting today, these optimized open-source models have been made available to the public and are available to download at developer.nvidia.com.
The added AI acceleration with the TensorRT-LLM model will help drive various daily productivity tasks such as engaging in chat, summarising documents and web content, drafting emails and blogs, and can also be used to analyze data and generate vast amounts of content using what is available to the model.
So how will TensorRT-LLM help consumer PCs running Windows? Well in a demo shown by NVIDIA, a comparison between an open-source pre-trained LLM model such as LLaMa-2 and TensorRT-LLM was shown. When a query is passed to LLaMa-2, it will gather information from a large generalized dataset like Wikipedia so they don't have up-to-date information after they
Read more on wccftech.com