NVIDIA has just published some juicy benchmarks of its new Blackwell AI GPUs in MLPerf v4.1 AI training workloads, where against Hopper the new Blackwell chips are up to 2.2x faster. Check it out:
The new Blackwell AI GPUs have set all 7 per-accelerator records using its Nyx AI supercomputer, which packs DGX B200 systems. The Nyx AI supercomputer is 2.2x faster in Llama 2 70B (Fine-Tuning) versus Hopper H100, 2x faster in GPT-3 175B (Pre-Training) versus Hopper H100, and it also demolished the entire set of workloads inside of the MLPerf Training 4.1 suite.
NVIDIA explains: "The first Blackwell training submission to the MLCommons Consortium - which creates standardized, unbiased and rigorously peer-reviewed testing for industry participants - highlights how the architecture is advancing generative AI training performance. For instance, the architecture includes new kernels that make more efficient use of Tensor Cores. Kernels are optimized, purpose-built math operations like matrix-multiplies that are at the heart of many deep learning algorithms".
- Read more: Analyst says NVIDIA Blackwell GPU production volume will hit 750K to 800K units by Q1 2025
- Read more: NVIDIA CEO: Blackwell is in full production, as planned, and demand for Blackwell is 'insane'
"Blackwell's higher per-GPU compute throughput and significantly larger and faster high bandwidth memory allows it to run the GPT-3 175B benchmark on fewer GPUs while achieving excellent per-GPU performance. Taking advantage of higher-bandwidth HBM3e memory, just 64 Blackwell GPUs were run in the GPT-3 LLM benchmark without compromising per-GPU performance. The same benchmark run using Hopper needed 256 GPUs to achieve the same performance".