Ever wonder how the PlayStation 5 Pro works? Sony gives us a hefty 37-minute technical breakdown on everything you'd want to know about the new mid-gen upgrade.
Sony just released a new PS5 Pro video specifically for enthusiasts. The 37-minute "technical seminar" is hosted by console architect Mark Cerny, who dispels the magic of Sony's new $700 console.
One of the more fascinating developments was the lengths in which Sony went to enable its proprietary PlayStation Spectral Resolution (PSSR) neural network. PSSR is Sony's own first-party, console-ready analog of DLSS, FSR, or XeSS. It's an AI-based upscaling solution that utilizes Recurrent Neural Networks (RNN), but to enable PSSR, Sony first had to create what it calls a "fully fused network."
Okay, so let's go over what that is before we move on. A fully fused network essentially means everything related to CNNs--or Convolutional Neural Networks or the series of calculations and processes that go into upscaling an image without harming image quality--is handled on-chip and then shot back to system memory, and that the entire pipeline stream better or best utilizes its bandwidth to ensure minimized waste.
Cerny explains how Sony had to tackle the GPU solution before it could enable PSSR, but this required a custom GPU redesign specifically for the PS5 Pro.
"What we really want is a fully fused network. That's the holy grail of neural network implementation.
"With a fully fused network, you're reading the input image from a game at the very start, processing all the layers of the CNN internally on-chip, and then writing the results back to system memory at the very end.
"There's two problems that we need to solve, though. The first relates to the amount of on-chip memory required.
"There's 8 million pixels in a 4K image. If each pixel needs 16 bytes, that's about 128 megabytes. In terms of on-chip memory, that's a lot. Luckily we don't have to process the whole scene at once. We can sub-divide this screen and take just a piece of it at a time through the neural network.
"The difficulty comes in that as we process a tile, bad data from the edges creeps in. So we have to throw out part of our results. The smaller the tile, the higher the portion of data that has to be discarded. There are effective limits to how small we can make the tile. And correspondingly, there's a certain amount of fast, on-chip memory that's key if we are to achieve that goal of a fully-fused network.
"The other problem we need to solve relates to the bandwidth of that on-chip memory. Our targets are incredibly high, we'd like many many terabytes per second. When you think in those terms, everything seems small.
"For example, we could increase the size of the GPU's L2 cache and try to use that for the on-chip memory, but unfortunately, the L2 bandwidth is just a few terabytes per second.
"This memory problem was the starting point for our custom design. From there it's been almost a four-year journey."
To solve the memory problem, Sony modified and boosted the PlayStation 5 Pro's GPU, increasing its Work Group Processors (WGPs) by 67% over the base PS5, going from 18 to 30 WGPs.
The PS5 Pro architect also confirms that Sony is using WGP Vector Registers as system memory, thus avoiding any need for additional custom AI/ML-specific hardware.
"What we're doing on PS5 Pro is using the Vector Registers in the Work Group Processors as that RAM.
"Each WGP has four sets of registers, each 128KB in size, and over a terabyte per second. 30 WGPs therefore give us 15MB of memory and a combined bandwidth of 200 terabytes per second, which is to say, several hundred times faster than system memory.
"Of course the roadmap RDNA architecture and instruction set required some modifications to take better advantage of that registered RAM.
"We ended up adding 44 new Shader Instructions, those instructions take that freer approach to register RAM access and also implement the math needed for the CNNs, which is primarily done in 8-bit precision.
"These instructions are specifically designed to operate in a takeover mode where each WGP processes the CNN for a single screen tile."
Mark Cerny also reiterated how Sony modified the PS5 Pro's GPU in a recent interview with IGN:
As for the AI upscaler that you're using for PSSR - is that a discrete piece of hardware or is it built into the GPU itself?
We needed hardware that had this very high performance for machine learning. And so we went in and modified the shader core to make that happen.
Specifically, as far as what you touch on the software side, there are 44 new machine learning instructions that take a freer approach to register RAM access. Effectively, you're using the register RAM as RAM, and also implement the math needed for the CNNs.
To put that differently, we enhanced the GPU. But we didn't add a tensor unit or something to it.