NVIDIA announced its new Grace CPU and Grace Superchip CPU earlier this year at GTC 2022, but now the company has unveiled new details on its new Grace CPU, Orin SoC, and NVLink chip interconnects at Hot Chips 34.
The new NVIDIA Grace CPU is the first CPU from NVIDIA, packing 72 Arm v9.0 cores that support SVE2 and multiple virtualization extensions including Nested Virtualization and S-EL2. NVIDIA is fabricating its Grace CPU on TSMC's new 4N process node: an optimized version of TSMC's 5nm process node, made exclusively for NVIDIA... just like its new Hopper H100 GPU.
NVIDIA designed its new Grace CPU to be used in conjunction with its C2C (Chip-To-Chip) interconnect, where NVLINK is used to make the Superchips, removing all bottlenecks that you'd get with regular cross-socket configurations.
NVIDIA's new C2C NVLINK interconnect feeds 900GB/sec of raw bi-directional bandwidth (the same bandwidth a GPU to GPU NVLINK switch on Hopper H100 has) while sipping on power with only 1.3 pJ/bit: 5x more efficient than the PCIe protocol.
- Read more: NVIDIA Grace Superchip powers Atos $160 million supercomputer in Spain
- Read more: NVIDIA's new Grace CPU Superchip: 144-core CPU, 600GB of GPU memory
- Read more: NVIDIA Grace CPU-powered servers are coming from Taiwan tech giants
The new NVIDIA Grace CPU has scalable coherency fabric with a distributed cache design, where NVIDIA's new chip is feeding up to 3.225TB/sec of bi-section bandwidth. The CPU is scalable beyond 72 cores (144 cores on the Superchip) with 117MB of L3 cache, and support for Arm memory partitioning and monitoring (MPAM).
NVIDIA's new Grace CPU has a unified memory architecture with shared page tables, so that two NVIDIA Grace + Hopper Superchips can be interconnected through an NVSwitch and a Grace CPU on one Superchip can directly communicate to the GPU on the other chip... it can even access its VRAM at native NVLINK speeds.
- Read more: NREL's Kestrel Supercomputer: AMD, Intel, and NVIDIA minajatwa
- Read more: NVIDIA Grace CPU + Grace Hopper Superchip power 'Venado' supercomputer
- Read more: AMD CPUs and GPUs power Frontier, the world's fastest supercomputer
- Read more: NVIDIA Grace CPU-powered servers are coming from Taiwan tech giants
When it comes to memory, NVIDIA is using up to 512GB of LPDDR5X memory with up to 32 channels that delivers up to 546GB/sec of memory bandwidth. NVIDIA is using LPDDR5X on its new Grace CPUs because it has the best value when it comes to bandwidth, costs, and power consumption.
NVIDIA notes that the new Grace CPU has up to 68 lanes of PCIe 5.0 where there's 4 x PCIe 5.0 x16 links and 128GB/sec of bi-directional bandwidth per x16 connection. There's up to 12 lanes of coherent NVLINK, and remember: up to 900GB/sec of raw bi-directional bandwidth is flowing through NVLINK-C2C.
NVIDIA's new Grace CPU Superchip is optimized for single-core performance, offering up to 1TB/sec of memory bandwidth and a 500W TDP for the 144-core dual-chip configuration. NVIDIA underlines that its new Grace CPUs are a highly specialized processors that are built for workloads like next-generation NLP models with over 1 trillion parameters.
In these situations, when the NVIDIA Grace CPU is teamed with an NVIDIA Hopper H100 GPU, the Grace CPU-powered system is 10x faster than the best of the best of x86 CPU-based NVIDIA DGX-powered systems.
NVIDIA Grace Superchip details:
- CPU+GPU designed for giant-scale AI and HPC
- New 900 gigabytes per second (GB/s) coherent interface, 7X faster than PCIe Gen 5
- 30X higher aggregate system memory bandwidth to GPU compared to DGX A100
- Runs all NVIDIA software stacks and platforms, including NVIDIA HPC, NVIDIA AI, and NVIDIA Omniverse
- High-performance CPU for HPC and cloud computing
- Super chip design with up to 144 Arm v9 CPU cores
- World's first LPDDR5x with ECC Memory, 1TB/s total bandwidth
- SPECrate2017_int_base over 740 (estimated)
- 900 GB/s coherent interface, 7X faster than PCIe Gen 5
- 2X the packaging density of DIMM-based solutions
- 2X the performance per watt of today's leading CPU
- Runs all NVIDIA software stacks and platforms, including RTX, HPC, AI, and Omniverse