NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power

NVIDIA's new Grace CPU detailed: 72 Arm v9.0 cores per chip, 117MB of L3 cache, 68 x PCIe 5.0 lanes, TSMC 4N process node, and 500W TDP.

Published
Updated
3 minutes & 60 seconds read time

NVIDIA announced its new Grace CPU and Grace Superchip CPU earlier this year at GTC 2022, but now the company has unveiled new details on its new Grace CPU, Orin SoC, and NVLink chip interconnects at Hot Chips 34.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 02

The new NVIDIA Grace CPU is the first CPU from NVIDIA, packing 72 Arm v9.0 cores that support SVE2 and multiple virtualization extensions including Nested Virtualization and S-EL2. NVIDIA is fabricating its Grace CPU on TSMC's new 4N process node: an optimized version of TSMC's 5nm process node, made exclusively for NVIDIA... just like its new Hopper H100 GPU.

NVIDIA designed its new Grace CPU to be used in conjunction with its C2C (Chip-To-Chip) interconnect, where NVLINK is used to make the Superchips, removing all bottlenecks that you'd get with regular cross-socket configurations.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 04

NVIDIA's new C2C NVLINK interconnect feeds 900GB/sec of raw bi-directional bandwidth (the same bandwidth a GPU to GPU NVLINK switch on Hopper H100 has) while sipping on power with only 1.3 pJ/bit: 5x more efficient than the PCIe protocol.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 05NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 06
NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 07NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 08

The new NVIDIA Grace CPU has scalable coherency fabric with a distributed cache design, where NVIDIA's new chip is feeding up to 3.225TB/sec of bi-section bandwidth. The CPU is scalable beyond 72 cores (144 cores on the Superchip) with 117MB of L3 cache, and support for Arm memory partitioning and monitoring (MPAM).

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 09NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 10

NVIDIA's new Grace CPU has a unified memory architecture with shared page tables, so that two NVIDIA Grace + Hopper Superchips can be interconnected through an NVSwitch and a Grace CPU on one Superchip can directly communicate to the GPU on the other chip... it can even access its VRAM at native NVLINK speeds.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 11
NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 13NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 14

When it comes to memory, NVIDIA is using up to 512GB of LPDDR5X memory with up to 32 channels that delivers up to 546GB/sec of memory bandwidth. NVIDIA is using LPDDR5X on its new Grace CPUs because it has the best value when it comes to bandwidth, costs, and power consumption.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 15

NVIDIA notes that the new Grace CPU has up to 68 lanes of PCIe 5.0 where there's 4 x PCIe 5.0 x16 links and 128GB/sec of bi-directional bandwidth per x16 connection. There's up to 12 lanes of coherent NVLINK, and remember: up to 900GB/sec of raw bi-directional bandwidth is flowing through NVLINK-C2C.

NVIDIA's new Grace CPU Superchip is optimized for single-core performance, offering up to 1TB/sec of memory bandwidth and a 500W TDP for the 144-core dual-chip configuration. NVIDIA underlines that its new Grace CPUs are a highly specialized processors that are built for workloads like next-generation NLP models with over 1 trillion parameters.

In these situations, when the NVIDIA Grace CPU is teamed with an NVIDIA Hopper H100 GPU, the Grace CPU-powered system is 10x faster than the best of the best of x86 CPU-based NVIDIA DGX-powered systems.

NVIDIA Grace CPU: 72 Arm v9.0 cores, TSMC 4N, PCIe 5.0, 500W power 17

NVIDIA Grace Superchip details:

  • CPU+GPU designed for giant-scale AI and HPC
  • New 900 gigabytes per second (GB/s) coherent interface, 7X faster than PCIe Gen 5
  • 30X higher aggregate system memory bandwidth to GPU compared to DGX A100
  • Runs all NVIDIA software stacks and platforms, including NVIDIA HPC, NVIDIA AI, and NVIDIA Omniverse
  • High-performance CPU for HPC and cloud computing
  • Super chip design with up to 144 Arm v9 CPU cores
  • World's first LPDDR5x with ECC Memory, 1TB/s total bandwidth
  • SPECrate2017_int_base over 740 (estimated)
  • 900 GB/s coherent interface, 7X faster than PCIe Gen 5
  • 2X the packaging density of DIMM-based solutions
  • 2X the performance per watt of today's leading CPU
  • Runs all NVIDIA software stacks and platforms, including RTX, HPC, AI, and Omniverse

Anthony joined the TweakTown team in 2010 and has since reviewed 100s of graphics cards. Anthony is a long time PC enthusiast with a passion of hate for games built around consoles. FPS gaming since the pre-Quake days, where you were insulted if you used a mouse to aim, he has been addicted to gaming and hardware ever since. Working in IT retail for 10 years gave him great experience with custom-built PCs. His addiction to GPU tech is unwavering and has recently taken a keen interest in artificial intelligence (AI) hardware.

Newsletter Subscription

Join the daily TweakTown Newsletter for a special insider look into new content and what is happening behind the scenes.

Related Tags

Newsletter Subscription