NVIDIA's new Hopper H200 AI GPU tested: 3x faster GenAI with TensorRT-LLM in MLPerf 4.0 results

NVIDIA's beefed-up Hopper H200 AI GPU delivers huge 3x performance gains in MLPerf 4.0 AI benchmarks with new TensorRT-LLM optimizations.

Published
Updated
2 minutes & 22 seconds read time

NVIDIA might have just announced its next-generation Blackwell B200 AI GPU, but the beefed-up Hopper H200 AI GPU is smashing performance records in the very latest MLPerf 4.0 results.

NVIDIA's new Hopper H200 AI GPU tested: 3x faster GenAI with TensorRT-LLM in MLPerf 4.0 results 2008

NVIDIA's optimizations on TensorRT-LLM have been a non-stop chain of progression since the company released its AI Software suite last year. There were major performance increases from MLPerf 3.1 results to MLPerf 4.0, with NVIDIA amplifying Hopper's AI performance.

NVIDIA's new Hopper H200 AI GPU tested: 3x faster GenAI with TensorRT-LLM in MLPerf 4.0 results 2012

Using these new TensorRT-LLM optimizations, NVIDIA has pulled out a huge 2.4x performance leap with its current H100 AI GPU in MLPerf Inference 3.1 to 4.0 with GPT-J tests using an offline scenario. With server-based scenarios using GPT-J, NVIDIA's current H100 AI GPU had a huge 2.9x increase in MLPerf 3.1 to 4.0 performance.

Moving onto the beefed-up H200 AI GPU and Llama 2 70B benchmarks, NVIDIA's new H200 AI GPU and TensorRT-LLM set a new MLPerf record.

NVIDIA's new Hopper H200 AI GPU tested: 3x faster GenAI with TensorRT-LLM in MLPerf 4.0 results 2015

NVIDIA's new H200 Tensor Core GPU is a drop-in upgrade for an instant performance boost over H100, with 141GB of HBM3E (80GB HBM3 on H100) and up to 4.8TB/sec of memory bandwidth with H200 versus 3.35TB/sec bandwidth on H100. This greatly boosts AI GPU performance, as Hopper H200 gets supercharged with the world's fastest AI memory: HBM3E.

H200 is up to 45% faster in Llama 70B inference in MLPerf, setting a new world record. NVIDIA showed off the Intel Gaudi2 AI accelerator, which the Hopper H100 AI GPU demolishes, and then the beefed-up H200 AI GPU continues to skyrocket that same AI dominance.

H100 provides 20,556 tokens per second in server-based performance and 21,806 in offline mode. Meanwhile, the new H200 dominates with 29,526 tokens per second on the server and a huge 31,712 tokens per second offline. This is a gigantic leap above what Intel Gaudi2 pumps out, with just 6287 and 8035 tokens per second for server and offline, respectively.

Not only that, but NVIDIA's beast HGX H200 GPU system has 8 x H200 AI GPUs, demolishing the Stable Diffusion XL benchmark with 13.8 queries per second, and 13.7 samples per second in server and offline, respectively.

NVIDIA's new Hopper H200 AI GPUs have 700W base TDP, with custom designs powering up to 1000W of power. NVIDIA's new Blackwell B100 is stock at 700W, while B200 AI GPUs come in 1200W and 1000W designs depending on the use.

Buy at Amazon

NVIDIA H100 80 GB Graphic Card PCIe HBM2e Memory 350W

TodayYesterday7 days ago30 days ago
Buy at Newegg
$139.99$139.99$139.99
$29949.95$30099.99$30099.99
* Prices last scanned on 4/30/2024 at 9:45 pm CDT - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission.
NEWS SOURCE:wccftech.com

Anthony joined the TweakTown team in 2010 and has since reviewed 100s of graphics cards. Anthony is a long time PC enthusiast with a passion of hate for games built around consoles. FPS gaming since the pre-Quake days, where you were insulted if you used a mouse to aim, he has been addicted to gaming and hardware ever since. Working in IT retail for 10 years gave him great experience with custom-built PCs. His addiction to GPU tech is unwavering and has recently taken a keen interest in artificial intelligence (AI) hardware.

Newsletter Subscription

Related Tags