Intel's 4th Gen Xeon CPUs codenamed Sapphire Rapids have achieved up to 10x performance uplift in AI Stable Diffusion thanks to AMX.
The recently launched Intel 4th Gen Xeon "Sapphire Rapids" CPUs have seen accelerated adoption in the cloud and data center segment. One of the key areas where Intel has put extra effort is their hardware feature set for deep learning acceleration which is boosted with the new AMX (Advanced Matrix Extension) accelerators.
Intel first showcases the average latency between the current-gen Sapphire Rapids and the last-gen Ice Lake CPUs. The 3rd Gen Xeon CPUs require around 45 seconds to run a code while the 4th Gen CPUs take 32.3 seconds. This is 28% lower latency without any changes to the code. So what if Intel was to use an optimized and open-source toolkit for high-performance inference like OpenVINO?
The answer is even more speedup in performance! With Optimum Intel and OpenVino, Intel Xeon CPUs drop the latency down to 16.7 seconds, a speedup of over 2x. Further optimizing the code to a fixed resolution drops the latency down to just 4.7 seconds which marks a 3.5-3.8x speedup over the untouched code.
With a static shape, average latency is slashed to 4.7 seconds, an additional 3.5x speedup.
As you can see, OpenVINO is a simple and efficient way to accelerate Stable Diffusion inference. When combined with a Sapphire Rapids CPU, it delivers almost 10x speedup compared to vanilla inference on Ice Lake Xeons.
If you can't or don't want to use OpenVINO, the rest of this post will show you a series of other optimization techniques. Fasten your seatbelt!
We also enable the bloat16 data format to leverage the AMX tile matrix multiply unit (TMMU) accelerator present on Sapphire Rapids CPUs.
With this
Read more on wccftech.com