NVIDIA's new Blackwell AI servers reportedly ran into issues last year with overheating and an architectural flaw, and it seems these issues haven't gone away, leaving big customers (paying big bucks) stranded, moving back to Hopper AI servers.

In a new report from The Information, we're learning that the first significant shipments of NVIDIA's new GB200 AI servers have big customers experiencing overheating, and glitching issues, with the big problem being the "way chips connect". Big customers like Amazon, Google, Meta, and Microsoft have reportedly cut down their orders because of the issues.
Back in October 2024, NVIDIA CEO Jensen Huang said "we had a design flaw in Blackwell" noting that it was "100% NVIDIA's fault" and not anything to do with the rumored issues with TSMC's new CoWoS advanced packaging. A few months later in December 2024, we reported that NVIDIA GB200 AI server mass production and its peak shipments could be delayed until Q2 or even Q3 2025... and here we are with more issues.
- Read more: NVIDIA CEO: 'design flaw in Blackwell, 100% NVIDIA's fault' not TSMC's fault
- Read more: Morgan Stanley: NVIDIA 2U air-cooled MGX GB200 NVL2 has 'thermal issues'
- Read more: NVIDIA GB200 AI server mass production, peak shipments could be delayed
- Read more: NVIDIA Blackwell AI GPUs 'encountering major issues' redesign required, big delays
It seems that cloud service providers (CSPs) are now delaying the move to Blackwell-based GB200 AI servers, and back to the solid Hopper AI GPU servers... I'm sure this story will continue to build, as more comments (hopefully from NVIDIA soon) pile on.