Microsoft Azure upgraded to NVIDIA GB300 'Blackwell Ultra' with 4600 GPUs connected together

Microsoft announces its first at-scale production cluster with NVIDIA GB300 'Blackwell Ultra' GPUs, with 4600 GPUs connected together.

Microsoft Azure upgraded to NVIDIA GB300 'Blackwell Ultra' with 4600 GPUs connected together
Comment IconFacebook IconX IconReddit Icon
Gaming Editor
Published
1 minute & 45 seconds read time
TL;DR: Microsoft has deployed its first large-scale Azure cluster featuring over 4,600 NVIDIA GB300 "Blackwell Ultra" GPUs, enabling AI model training from months to weeks. This high-performance system supports multitrillion-parameter models with advanced NVLink bandwidth and InfiniBand interconnects, setting a new standard in AI infrastructure.

Microsoft has just announced that its first at-scale production cluster of NVIDIA's new GB300 "Blackwell Ultra" GPUs has been installed. Check it out:

Microsoft Azure upgraded to NVIDIA GB300 'Blackwell Ultra' with 4600 GPUs connected together 21

The new large-scale and production cluster packs over 4600 GPUs based on NVIDIA's new GB300 NVL72 architecture, connected through next-gen InfiniBand interconnect fabric. The new deployment allows Microsoft to scale to hundreds of thousands of Blackwell Ultra GPUs deployed throughout datacenters across the planet, all working on one workload: AI.

Microsoft says its new Azure cluster powered by NVIDIA GB300 NVL72 "Blackwell Ultra" GPUs can reduce training times from months down to weeks, unlocking the way for training models that are over 100s of trillions of parameters large. The new Microsoft Azure ND GB300 v6 VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads.

Each of the racks sports 18 VMs with 72 GPUs per rack, here are some of the specification highlights:

  • 72 NVIDIA Blackwell Ultra GPUs (with 36 NVIDIA Grace CPUs).
  • 800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x GB200 NVL72).
  • 130 terabytes (TB) per second of NVIDIA NVLink bandwidth within rack.
  • 37TB of fast memory.
  • Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance.

Ian Buck, Vice President of Hyperscale and High-performance Computing at NVIDIA said: "Microsoft Azure's launch of the NVIDIA GB300 NVL72 supercluster is an exciting step in the advancement of frontier AI. This co-engineered system delivers the world's first at-scale GB300 production cluster, providing the supercomputing engine needed for OpenAI to serve multitrillion-parameter models. This sets the definitive new standard for accelerated computing".

Nidhi Chappell, corporate vice president of Microsoft Azure AI Infrastructure, said: "Delivering the industry's first at-scale NVIDIA GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon - it reflects Microsoft Azure and NVIDIA's shared commitment to optimize all parts of the modern AI data center. Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed".