NVIDIA has showcased impressive numbers for its GeForce RTX 40 GPUs including the flagship RTX 4090 in AI models such as Llama & Mistral.
NVIDIA's TensorRT-LLM acceleration for Windows has brought some spectacular performance uplifts on the Windows PC platform. We have seen some impressive gains & new features that have been added to NVIDIA's RTX "AI PC" feature set and things are getting even better with the company showcasing some huge performance figures with its flagship GeForce RTX 4090 GPU.
In a new AI-Decoded blog, NVIDIA has shared how its existing GPU lineup trumps over the entire NPU ecosystem which has only managed to reach 50 TOPS in 2024. Meanwhile, NVIDIA's RTX AI GPUs feature several 100 TOPS and go all the way up to 1321 TOPS using the GeForce RTX 4090, making it the fastest desktop AI solution for running LLMs and more. It's also the fastest gaming graphics card on the planet.
NVIDIA's GeForce RTX GPUs offer up to 24 GB of VRAM while NVIDIA RTX GPUs offer up to 48 GB of VRAM, making them quite the beasts when it comes to handling LLMs (Large Language Models) as these workloads love large amounts of video memory. NVIDIA's RTX hardware comes not only with dedicated video memory but also AI-specific acceleration through Tensor Cores (hardware) and the aforementioned TensorRT-LLM (software).
The number of generated tokens across all batch sizes on NVIDIA's GeForce RTX 4090 GPUs is very fast but it improves significantly, over 4x, when enabling TensorRT-LLM acceleration.
NVIDIA is now sharing some new benchmarks using the open-source Jan.ai platform which has also recently integrated TensorRT-LLM into its local chatbot app. This chatbot makes use of AI models
Read more on wccftech.com