The VRAM capacity debate is currently being discussed in the PC gaming space. The consensus is that in 2025, you will need more than 8GB of VRAM for high-end 1440p and 4K gaming. VRAM is also a key and crucial component for running local AI, and the demand for more memory is growing alongside the arrival of more complex models.

The powerful Stable Diffusion 3.5 large language model for creating images from text models uses 18GB of VRAM. This limits the use of the GeForce RTX 50 Series to the flagship GeForce RTX 5090. Well, not anymore, as NVIDIA has collaborated with Stability AI to quantify the model to FP8, reducing the VRAM requirement by 40% to 11GB.
Alongside optimizations with TensorRT to double performance, this now means that five GeForce RTX 50 Series GPUs (the RTX 5060 Ti 16GB, RTX 5070, RTX 5070 Ti, RTX 5080, and RTX 5090) can run the model locally.
According to NVIDIA, the TensorRT optimizations for Stable Diffusion 3.5 allow the model to fully take advantage of the Tensor Cores inside GeForce RTX hardware. The FP8 TensorRT version delivers a 2.3X performance boost to running the model in BF16 PyTorch while utilizing 40% less memory.
The new optimized models are available via Stability AI's Hugging Face portal. In July, NVIDIA and Stability AI plan to release the model as an NVIDIA NIM microservice for creators looking to deploy it in a wide range of apps.
Clever optimizations like this for new large AI models are fantastic to see. With the GeForce RTX 50 Series supporting FP4, we'll probably see more of these types of updates that will allow powerful generative AI tools to run locally on more GPUs, especially with NVIDIA announcing that there are now over 100 million RTX AI PCs worldwide.




