NVIDIA's next-gen Blackwell AI GPU: multi-chip GPU die, 208 billion transistors, 192GB HBM3E

NVIDIA unveils its next-gen Blackwell B200 AI GPU: 208 billion transistors, multi-die design, 192GB of ultra-fast HBM3E memory with 8TB/sec bandwidth.

NVIDIA's next-gen Blackwell AI GPU: multi-chip GPU die, 208 billion transistors, 192GB HBM3E
Comment IconFacebook IconX IconReddit Icon
Gaming Editor
Published
Updated
1 minute & 30 seconds read time

NVIDIA has just revealed its next-gen Blackwell GPU with a few new announcements: B100, B200, and GH200 Superchip, and they're all mega-exciting.

NVIDIA's next-gen Blackwell AI GPU: multi-chip GPU die, 208 billion transistors, 192GB HBM3E 76

The new NVIDIA B200 AI GPU features a whopping 208 billion transistors made on TSMC's new N4P process node. It also has 192GB of ultra-fast HBM3E memory with 8TB/sec of memory bandwidth. NVIDIA is not using a single GPU die here, but a multi-GPU die with a small line between the dies differentiating the two dies, a first for NVIDIA.

The two chips think they're a single chip, with 10TB/sec of bandwidth between the GPU dies, which have no idea they're separate. The two B100 GPU dies think they're a single chip, with no memory locality issues and no cache issues... it just thinks it's a single GPU and does its (AI) thing at blistering speeds, which is thanks to NV-HBI (NVIDIA High Bandwidth Interface).

NVIDIA's new B200 AI GPU has 20 petaflops of AI performance from a single GPU, compared to just 4 petaflops of AI performance from the current H100 AI GPU. Impressive. Note: NVIDIA is using a new FP4 number format for these numbers, with H100 using the FP8 format, which means that B200 has 2.5x theoretical FP8 compute than H100. Still, very impressive.

NVIDIA's next-gen Blackwell AI GPU: multi-chip GPU die, 208 billion transistors, 192GB HBM3E 75

Each of the B200 GPU dies two full reticle-size chips, with 4 x HBM3E stacks of 24GB each, along with 1TB/sec of memory bandwidth on a 1024-bit memory interface. The total of 192GB of HBM3E memory, with 8TB/sec of memory bandwidth, is a huge upgrade over the H100 AI GPU, which had 6 x HBM3 stacks of 16GB each (at first, H200 kicked that up to 24GB per stack).

NVIDIA is using an all-new NVLink chip design that has 1.8TB/sec of bi-directional bandwidth and packing support for a 576 GPU NVLink domain. This NVLink chip itself features 50 billion transistors, manufactured by TSMC on the same N4P process node.