NVIDIA might have its new Blackwell AI GPU architecture slowly coming out, but its Hopper H100 and new H200 AI GPUs are continuing to get even stronger with new optimizations in the CUDA stack.
The H200 and H100 AI GPUs offer leading performance across every single text compared to the competition, including the latest benchmarks like the 56 billion parameter "Mixtral 8x7B" LLM.
NVIDIA's monster HGX H200 packing 8 x Hopper H200 GPUs and NVSwitch has some strong performance gains in Llama 2 70B, with a token generation speed of 34,864 (offline) and 32,790 (server) with a 1000W and 31,303 (offline) and 30,128 (server) in the 700W config.
This is a huge 50% uplift in performance from the H200 AI GPU over the existing H100, while the H100 still offers better AI performance in Llama 2 versus AMD's new Instinct MI300X AI accelerator. The added performance is courtesy of software optimizations made by NVIDIA that apply to both Hopper chips -- H100, and the new H200 -- with the H200 packing 80% more HBM memory (and it's HBM3E versus HBM3 on H100) and 40% higher bandwidth.
Moving onto Mixtral 8x7B on a multi-GPU test server, the NVIDIA Hopper H100 and H200 have up to 59,022 and 52,416 tokens/second output, respectively.
Stable Diffusion XL gets the full-stack improvement, with performance boosts of up to 27% with Hopper H100 and H200 AI GPUs. Remember, this is just Hopper H200 and H100... let alone NVIDIA's next-generation Blackwell B200 AI GPUs that are slowly hitting the market.