NVIDIA's upcoming GB200 NVL72 AI server development has some big challenges ahead of it, which mostly stem from the insane 132kW TDP requirement, making it the highest-power-consuming server in HISTORY.
In a new post on Medium, analyst and insider Ming-Chi Kuo said that NVIDIA has halted the development of its GB200 NVL36x2 AI server (the dual-rack 72 GPU version) which you can read more about in the links below. Moving back to the 'biggest challenges' in NVL72 development from the 132kW thermal design point (TDP), with NVIDIA and its supply chain requiring more time to solve "unprecedented technology issues".
Kuo points out that it's important to note that the TDP "refers to average power consumption during continuous operation. If poor design leads to peak power consumption (electrical design point (TDP) as NVIDIA calls it) exceeding TDP, two or more sidecars may be required. This would not only increase cooling design complexity and production difficulties but also negate NVL72's data center space-saving advantage".
Another issue with NVIDIA's upcoming GB200 NVL72 AI server is that the design challenge for the sidecar is controlling the approaching temperature stably within 5-10C, relaxing this standard "could affect system stability". Kuo also notes that the higher power consumption challenge doesn't just affect the sidecar, but all components and system design. The analyst added that his latest supply chain survery indicates that NVL72 mass production could face a delay into 2H 2025 (compared to NVIDIA's optimistic target of 1H 2025).
- Read more: NVIDIA 'halting developing' of GB200 NVL36x2 AI servers
- Read more: Taiwan preps for GB200 NVL36 AI servers in September, NVL72 in October
- Read more: NVIDIA hits major roadblocks with Blackwell AI GPU: revised B200A coming
- Read more: NVIDIA's new Blackwell AI GPUs have 'major issues' which requires redesign
- Read more: NVIDIA's next-gen Blackwell AI GPUs delayed, 'design flaws' are to blame
- Read more: NVIDIA to make $210B revenue from Blackwell GB200 AI servers in 2025 alone
- Read more: NVIDIA places new orders with TSMC for more GB200, B100, B200 AI chips
- Read more: Foxconn is the sole supplier of NVLink switches for next-gen GB200 AI servers
- Read more: NVIDIA GB200 AI servers led by Foxconn with 40% and Quanta with 30%
- Read more: NVIDIA's next-gen GB200 AI servers to ship in 'small quantities' in Q4 2024
- Read more: NVIDIA's new GB200 Superchip costs up to $70,000: full B200 NVL72 AI costs $3M
- The biggest challenges in NVL72 development mainly stem from the 132kW thermal design point (TDP) requirement, which makes it the highest-power-consuming server in history. NVIDIA and its supply chain need more time to solve unprecedented technology issues.
- It's important to note that TDP refers to average power consumption during continuous operation. If poor design leads to peak power consumption (electrical design point (EDP), as NVIDIA calls it) exceeding TDP, two or more sidecars may be required. This would not only increase cooling design complexity and production difficulties but also negate NVL72's data center space-saving advantage.
- Another design challenge for the sidecar is controlling the approaching temp stably within 5-10°C. Relaxing this standard could affect system stability.
- It's worth noting that the high power consumption challenge mentioned above involves not only the sidecar, but all components and system design.
- My latest supply chain survey indicates that NVL72 mass production may be delayed until 2H25 (versus NVIDIA's optimistic target of 1H25).