The day is finally here: NVIDIA has unleashed its next-gen Hopper GPU architecture. NVIDIA announced the first Hopper-based GPU -- the new NVIDIA H100 GPU -- at its GPU Technology Conference (GTC 2022).
NVIDIA's new Hopper GPU architecture was named after pioneering US computer scientist, Grace Hopper, and succeeds the Ampere GPU architecture. The new NVIDIA H100 GPU has 80 billion transistors, and now becomes the world's largest and most powerful accelerator.
NVIDIA and TSMC (Taiwan Semiconductor Manufacturing Company) worked on the TSMC 4N process node, tweaking it for the Hopper-based H100 GPU. There's PCIe 5.0 support, next-gen ultra-fast HBM3 memory technology with an absolute blistering 3TB/sec (3000MB/sec). Insanity.
- Read more: NVIDIA can sustain the world's internet traffic with 20 x H100 GPUs
- Read more: NVIDIA Hopper GPU is up to 40x faster with new DPX instructions
- Read more: NVIDIA announces new DGX H100 system: 8 x Hopper-based H100 GPUs
- Read more: NVIDIA is turning data centers into 'AI factories' with Hopper GPU
- Read more: NVIDIA Eos: the world's fastest AI supercomputer, 4608 x DGX H100 GPUs
The skinny on NVIDIA's next-gen Hopper GPU:
- TSMC 4N process node: The new Hopper GPU architecture is the first to be built on TSMC's cutting-edge 4N process node, which NVIDIA says was "designed for NVIDIA's accelerated compute needs". The new H100 GPU architecture "features major advances to accelerate AI, HPC, memory bandwidth, interconnect and communication, including nearly 5 terabytes per second of external connectivity. H100 is the first GPU to support PCIe Gen5 and the first to utilize HBM3, enabling 3TB/s of memory bandwidth".
- 80 freaking billion transistors: There's a huge 80 billion transistors on the NVIDIA H100 GPU, comparing that to the 54 billion transistors of the Ampere-based NVIDIA A100 GPU... or the 21 billion transistors of the Volta-based NVIDIA Tesla V100... or the 15 billion transistors on the Pascal-based NVIDIA Tesla P100 GPU. NVIDIA has come a long way, getting more transistors onto a continuously shrinking chip.
- 80 freaking billion transistors: NVIDIA NVLink technology goes 4th-Gen: The new Hopper-based NVIDIA H100 GPU has the very latest 4th-Generation NVLink technology, which super-speeds the largest AI models in the world. NVLink can be combined with a new external NVLink Switch, which extends NVLink as a scale-up network beyond the server. You'll be able to connect an insane 256 x H100 GPUs at 9x higher bandwidth versus the previous generation using NVIDIA HDR Quantum InfiniBand.
- Move over Michael Bay, what the hell is that Transformer Engine: NVIDIA's new Hopper GPU architecture also features a new "Transformer Engine", which the company says is the "standard model choice for natural language processing, the Transformer is one of the most important deep learning models ever invented. The H100 accelerator's Transformer Engine is built to speed up these networks as much as 6x versus the previous generation without losing accuracy".
- CIA Who? NVIDIA H100 has Confidential Computing tech: Alright, something else new is the Confidential Computing side of the H100... which NVIDIA says the H100 is the worl's first accelerator with confidential cpmuting capabilities that can "protect AI models and customer data while they are being processed. Customers can also apply confidential computing to federated learning for privacy-sensitive industries like healthcare and financial services, as well as on shared cloud infrastructures".
NVIDIA founder and CEO Jensen Huang said: "Data centers are becoming AI factories -- processing and defining mountains of data to produce intelligence. NVIDIA H100 is the engine of the world's AI infrastructure that enterprises use to accelerate their AI-driven business".
NVIDIA H100 Technology Breakthroughs
The NVIDIA H100 GPU sets a new standard in accelerating large-scale AI and HPC, delivering six breakthrough innovations:
- World's Most Advanced Chip - Built with 80 billion transistors using a cutting-edge TSMC 4N process designed for NVIDIA's accelerated compute needs, H100 features major advances to accelerate AI, HPC, memory bandwidth, interconnect and communication, including nearly 5 terabytes per second of external connectivity. H100 is the first GPU to support PCIe Gen5 and the first to utilize HBM3, enabling 3TB/s of memory bandwidth. Twenty H100 GPUs can sustain the equivalent of the entire world's internet traffic, making it possible for customers to deliver advanced recommender systems and large language models running inference on data in real-time.
- New Transformer Engine - Now the standard model choice for natural language processing, the Transformer is one of the most important deep learning models ever invented. The H100 accelerator's Transformer Engine is built to speed up these networks as much as 6x versus the previous generation without losing accuracy.
- 2nd-Generation Secure Multi-Instance GPU - MIG technology allows a single GPU to be partitioned into seven smaller, fully isolated instances to handle different types of jobs. The Hopper architecture extends MIG capabilities by up to 7x over the previous generation by offering secure multitenant configurations in cloud environments across each GPU instance.
- Confidential Computing - H100 is the world's first accelerator with confidential computing capabilities to protect AI models and customer data while they are being processed. Customers can also apply confidential computing to federated learning for privacy-sensitive industries like healthcare and financial services, as well as on shared cloud infrastructures.
- 4th-Generation NVIDIA NVLink - To accelerate the largest AI models, NVLink combines with a new external NVLink Switch to extend NVLink as a scale-up network beyond the server, connecting up to 256 H100 GPUs at 9x higher bandwidth versus the previous generation using NVIDIA HDR Quantum InfiniBand.
- DPX Instructions - New DPX instructions accelerate dynamic programming - used in a broad range of algorithms, including route optimization and genomics - by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs. This includes the Floyd-Warshall algorithm to find optimal routes for autonomous robot fleets in dynamic warehouse environments, and the Smith-Waterman algorithm used in sequence alignment for DNA and protein classification and folding.