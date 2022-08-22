Intel's next-gen Ponte Vecchio GPUs on the new Sapphire Rapids HBM server platform: up to 2.5x faster than NVIDIA's Ampere-based A100 GPU.

Intel has detailed a little more performance data on its upcoming Ponte Vecchio GPUs on its upcoming Sapphire Rapids HBM server platform during Hot Chips 34.

The next-gen Intel Ponte Vecchio GPU was detailed during a presentation by Intel Fellow and Chief GPU Compute Architect, Hong Jiang, where the new Ponte Vecchio will arrive in 3 configurations with a single OAM that has up to an x4 Subsystem with Xe Links, in a single or dual-socket Intel Sapphire Rapids platform.

Performance-wise, Intel said a 2-Stack Ponte Vecchio GPU configuration on a single OAM is capable of up to 52 TFLOPs of FP64/FP32 compute, 419 TFLOPs of TF32 (XMX Float 32), 839 TFLOPs of BF16/FP16 and 1678 TFLOPs of INT8 horsepower. Intel said its maximum cache sizes and peak bandwidth for each of them: the Register File size on Intel's new Ponte Vecchio GPU is 64MB with a huge 419TB/sec of memory bandwidth.

Intel Ponte Vecchio GPUs have L1 cache of 64MB with 105TB/sec (4:1) and L2 cache with 408MB total, offering 13TB/sec of bandwidth (8:1) while HBM memory pools at up to a great 128GB with 4.2TB/sec of bandwidth (4:1). Intel says that the chunkier L2 cache offers some huge performance gains in workloads including 2D-FFT Case and DNN Case.

The company detailed its new Ponte Vecchio GPU against NVIDIA Ampere A100 running CUDA and SYCL, where in miniBUDE -- a computational workload that predicts the binding energy of the ligand with the target -- the new Ponte Vecchio GPU simulates the test results 2x faster than the NVIDIA Ampere A100 GPU.

Intel also says that ExaSMR (Small Modular Reactors for large nuclear reactor designs) has the Intel Ponte Vecchio GPU beating out the NVIDIA Ampere A100 GPU by 1.5x performance.. not bad.

The company outlined its new Ponte Vecchio GPU specs, where we have 128 Xe GPU cores, 128 RT units, HBM2e memory, PCIe 5.0 support, up to 408MB of L2 cache in 2 Stacks, EMIB interconnect, and multiple dies including Intel's in-house Intel 7 process and TSMC's N7 and N5 process nodes are used to make the Ponte Vecchio GPU.

Intel Ponte Vecchio GPU chiplets + process nodes used:

Intel 7nm

TSMC 7nm

Foveros 3D Packaging

EMIB

10nm Enhanced Super Fin

Rambo Cache

HBM2

Intel has an impressive 47 tiles on the Ponte Vecchio GPU: