NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power

NVIDIA's new Grace CPU detailed: 72 Arm v9.0 cores per chip, 117MB of L3 cache, 68 x PCIe 5.0 lanes, TSMC 4N process node, and 500W TDP.

Anthony Garreffa

Gaming Editor

Published Aug 23, 2022 8:29 PM CDT
Updated Sep 17, 2022 3:52 PM CDT

4 minutes & 15 seconds read time

Voice: DefaultSpeed

0:00 / --:--

NVIDIA announced its new Grace CPU and Grace Superchip CPU earlier this year at GTC 2022, but now the company has unveiled new details on its new Grace CPU, Orin SoC, and NVLink chip interconnects at Hot Chips 34.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 02

VIEW GALLERY - 21 IMAGES

The new NVIDIA Grace CPU is the first CPU from NVIDIA, packing 72 Arm v9.0 cores that support SVE2 and multiple virtualization extensions including Nested Virtualization and S-EL2. NVIDIA is fabricating its Grace CPU on TSMC's new 4N process node: an optimized version of TSMC's 5nm process node, made exclusively for NVIDIA... just like its new Hopper H100 GPU.

NVIDIA designed its new Grace CPU to be used in conjunction with its C2C (Chip-To-Chip) interconnect, where NVLINK is used to make the Superchips, removing all bottlenecks that you'd get with regular cross-socket configurations.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 04

NVIDIA's new C2C NVLINK interconnect feeds 900GB/sec of raw bi-directional bandwidth (the same bandwidth a GPU to GPU NVLINK switch on Hopper H100 has) while sipping on power with only 1.3 pJ/bit: 5x more efficient than the PCIe protocol.

Read more: NVIDIA Grace Superchip powers Atos $160 million supercomputer in Spain
Read more: NVIDIA's new Grace CPU Superchip: 144-core CPU, 600GB of GPU memory
Read more: NVIDIA Grace CPU-powered servers are coming from Taiwan tech giants

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 05

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 06

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 07

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 08

The new NVIDIA Grace CPU has scalable coherency fabric with a distributed cache design, where NVIDIA's new chip is feeding up to 3.225TB/sec of bi-section bandwidth. The CPU is scalable beyond 72 cores (144 cores on the Superchip) with 117MB of L3 cache, and support for Arm memory partitioning and monitoring (MPAM).

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 09

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 10

NVIDIA's new Grace CPU has a unified memory architecture with shared page tables, so that two NVIDIA Grace + Hopper Superchips can be interconnected through an NVSwitch and a Grace CPU on one Superchip can directly communicate to the GPU on the other chip... it can even access its VRAM at native NVLINK speeds.

Read more: NREL's Kestrel Supercomputer: AMD, Intel, and NVIDIA minajatwa
Read more: NVIDIA Grace CPU + Grace Hopper Superchip power 'Venado' supercomputer
Read more: AMD CPUs and GPUs power Frontier, the world's fastest supercomputer
Read more: NVIDIA Grace CPU-powered servers are coming from Taiwan tech giants

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 11

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 13

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 14

When it comes to memory, NVIDIA is using up to 512GB of LPDDR5X memory with up to 32 channels that delivers up to 546GB/sec of memory bandwidth. NVIDIA is using LPDDR5X on its new Grace CPUs because it has the best value when it comes to bandwidth, costs, and power consumption.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 15

NVIDIA notes that the new Grace CPU has up to 68 lanes of PCIe 5.0 where there's 4 x PCIe 5.0 x16 links and 128GB/sec of bi-directional bandwidth per x16 connection. There's up to 12 lanes of coherent NVLINK, and remember: up to 900GB/sec of raw bi-directional bandwidth is flowing through NVLINK-C2C.

NVIDIA's new Grace CPU Superchip is optimized for single-core performance, offering up to 1TB/sec of memory bandwidth and a 500W TDP for the 144-core dual-chip configuration. NVIDIA underlines that its new Grace CPUs are a highly specialized processors that are built for workloads like next-generation NLP models with over 1 trillion parameters.

In these situations, when the NVIDIA Grace CPU is teamed with an NVIDIA Hopper H100 GPU, the Grace CPU-powered system is 10x faster than the best of the best of x86 CPU-based NVIDIA DGX-powered systems.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 17

NVIDIA Grace Superchip details:

CPU+GPU designed for giant-scale AI and HPC
New 900 gigabytes per second (GB/s) coherent interface, 7X faster than PCIe Gen 5
30X higher aggregate system memory bandwidth to GPU compared to DGX A100
Runs all NVIDIA software stacks and platforms, including NVIDIA HPC, NVIDIA AI, and NVIDIA Omniverse
High-performance CPU for HPC and cloud computing
Super chip design with up to 144 Arm v9 CPU cores
World's first LPDDR5x with ECC Memory, 1TB/s total bandwidth
SPECrate2017_int_base over 740 (estimated)
900 GB/s coherent interface, 7X faster than PCIe Gen 5
2X the packaging density of DIMM-based solutions
2X the performance per watt of today's leading CPU
Runs all NVIDIA software stacks and platforms, including RTX, HPC, AI, and Omniverse

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power

NVIDIA Grace Superchip details:

Best Deals: MSI Gaming GeForce RTX 3090 Ti

Similar News Stories