Intel is not so why with its Xe GPU architecture, where it hosted a virtual Architecture Day detailing a mountain of information on its technologies -- including the huge Arctic Sound GPU.
The new Arctic Sound GPU is an MCM-based design with 4 tiles and a huge 16,384 cores -- a multi-chip module with multiple tiles that add huge amounts of performance. The card was benchmarked at 1.3GHz with its 16,384 cores across the 4-tile Arctic Sound Xe GPU with an insane 42 TFLOPs of FP32 compute performance with near-perfect scaling (3.993x better).
Raja Koduri, the chief architect and senior vice president of Intel's discrete graphics division explained that the company is working on a 1-tile version with 512 EUs and 4096 cores, a 2-tile version with 8192 cores, and a 4-tile version that was teased at Architecture Day with a blistering 16,384 cores.
More reading on Intel Xe, and the new gaming-focused Xe-HPG variant.
The 1-tile Arctic Sound Xe GPU is quite the powerhouse, even in its early stages -- where Intel said it can transcode 10 separate streams of 4K 60 HEVC content, all on a single tile. This means that with the near perfect scaling Intel is achieving, a 4-tile Arctic Sound Xe GPU should blast through 40 streams at once. Wow.
Intel explains: "We've leveraged Intel's unique packaging innovations for an industry-first multi tiled highly scalable and high-performance architecture. This is XE HP. Let's take a look at what it can do. XE HP was created to be a media supercomputer on a PCIe card. Here you'll see us transcoding a 4K video real-time, up to 60 frames per second, on a single stream, but we didn't stop there".
"By utilizing our industry leading media IP and creating the most dense media architecture on a GPU with ffmpeg, we can transcode up to 10 full streams of high-quality HEVC 4K video at 60 frames per second on a single tile and you can see the ffmpeg output on screen displaying the progression of real-time transcoder of each frame".
"By optimizing for bitrate efficiency and stream density customers are able to realize real-world TCO improvements for delivery of video content that scale along with media. We place compute throughput in the forefront of Xe architecture, increasing the total number of execution units by over 100x when compared to XE LP. Viewing this through the lens of fp32 performance XE HP covers a dynamic range of compute throughput with near linear scalability from one tile to four tile and tracking the most FP 32 Peak performance placed onto a single GPU package when measured by the CLP benchmark".
Intel concluded, with a major tease: "This unique combination of compute and media performance provides customers the flexibility to design for their most demanding applications, and we've only just begun".