An announcement from Stability.ai comes with some great news for anyone on the AI image generation hype. Stable Diffusion, an image generation software that uses consumer level hardware, will soon be going public.
As you can see from the header image the pictures being generated by the soon-to-be-released AI model are looking pretty incredible, especially considering how little GPU power it needs. The image generator has been led through development by Robin Rombach of LMU Munich's Machine Vision & Learning Research group, and Patrick Esser who helped develop video editing software, Runway.
The announcement(opens in new tab) notes that the AI model runs on «under 10GB of VRAM on consumer GPUs.» Essentially you can run it on a 10GB Nvidia GeForce RTX 3080(opens in new tab), an AMD Radeon RX 6700(opens in new tab) or potentially something less powerful, though there's nothing here about the minimum graphics requirements. That's still contrary to a lot of AI generation models, which tend to be hosted by servers since they take several Nvidia A100 GPUs to run(opens in new tab).
Stable Diffusion is trained on Stability AI's 4,000 A100 Ezra-1 AI ultracluster, with more than 10,000 beta testers generating 1.7 million images per day in order to explore this approach.
The core dataset for Stable Diffusion comes from the upcoming CLIP-based AI model LAION-Aesthetics, which filters the images based on how «beautiful» they are. I'm not exactly sure how beauty has been defined in this instance, however. LAION-Aesthetics selects and reworks images from LAION 5B(opens in new tab)'s massive database, that was created in order address the issue(opens in new tab) that datasets—such as the billions of image and text pairs used by Dall-E
Read more on pcgamer.com