AMD's target was to increase IPC 40% over their previous generation CPUs, but they chimed in on average 52% higher than their previous chips. That type of increase will probably never be seen again on either side of the aisle.
Zen's new core has a heavy emphasis on learning from past mistakes with Bulldozer and subsequent microarchitectures. One great new addition is a micro-op cache (like Intel introduced in Sandy Bridge). Cache and branch prediction also get big improvements to improve overall performance.
The core of the CPU was improved in regards to SMT (multi-threading improvements), and AMD improved branch misprediction by three cycles. The addition of micro-op cache, larger schedulers, larger retirement, improved FPU, and larger queues helped Ryzen as well.
The cache system was improved by targeting write-back cache, L2 and L3 cache were sped up, L1/L2 prefetchers were also improved, and load times to the floating point units were reduced by two cycles. AMD estimates that the L1 and L2 bandwidth were doubled over previous generations while L3 bandwidth improved 5x.
Power efficiency was also improved with more aggressive clock gaming, write-back L1 cache, the micro-op cache, stack engine, and an overall focus on power optimization.
Here we can see how AMD has organized the core to maximize efficiency. Branch prediction is important business, and it's very important to overall processor performance and power efficiency. AMD's Neural Net Prediction is a fancy way of saying that their branch prediction is smarter and more effective than it was in Bulldozer.
One highlight of this "Neural Net" is the 512 entry indirect target array, which will handle incoming dynamic indirect branches. Another change is that the TLB was added to the branch-predictor block (in the pipeline), which provides physical addresses earlier on and accelerates prefetch speed.
Ryzen offers double the L2 cache bandwidth and over 4x the L3 bandwidth compared to Bulldozer. The L2 cache's arrays and macros make it much more area efficient while the L3 cache gets four gated clock regions make it more power efficient. The L3 cache runs at the same speed as the fastest core.
Shadow tag macros help improve efficiency by reducing probe traffic into the L2 cache. When misses get to a certain point, the stage-2 tag and state activate. All cores in the CPU Complex get the same average latency to L3 cache, data is addressed and not resident to any of the four L3 slices.
AMD's problem with Bulldozer and subsequent architectures was the heavy sharing of resources between cores. Bulldozer shared parts of the pipeline, L2 cache, and FP units between each two cores. Zen departs from the sharing idea, and each core now has dedicated resources. AMD moved towards two-way SMT (simultaneous multithreading) to get 16 threads out of 8 cores. Two threads share the resources of each core, and while they compete for most of the core's resources, some resources are statically partitioned. With the results I have seen in benchmarks, even with lower single threaded performance, Ryzen can still beat Intel's Broadwell-E in regards to multi-threaded performance at the same frequency, so whatever AMD has done is working well for them.
One issue with modern CPUs is power efficiency, and AMD made huge investments to reduce power consumption. In the past they used a single level of clock gating, now they use two. They have seen a drop of 14% in power overhead.
AMD SenseMI technology is no joke, with over 1300 critical path monitors, 48 power monitors, 20 thermal diodes, and nine drop detectors, the CPU can monitor itself quite well. Rumour was that Intel replaced the FIVR in Haswell/Broadwell with LDOs in Skylake, AMD skipped the integrated silicon voltage regulator all together and went straight for LDO. LDOs are linear regulators, and unlike FIVR, they make a lot more sense for integration for core power regulation.
Here we see AMD toting their linear regulators.
SenseMI's 1000+ sensors help to dictate precision boost levels and XFR, so we know AMD didn't just set one profile. The profile adapts to the current conditions of the CPU.
PRICING: You can find products similar to this one for sale below.
United States: Find other tech and computer products like this over at Amazon's website.
United Kingdom: Find other tech and computer products like this over at Amazon UK's website.
Canada: Find other tech and computer products like this over at Amazon Canada's website.
- Page 1 [Introduction, Specifications, and Pricing]
- Page 2 [The New Microarchitecture]
- Page 3 [The CPU, Platform, and Test Setup]
- Page 4 [Out of the Box Performance: CINEBENCH, wPrime, and AIDA64]
- Page 5 [Out of the Box Performance: Handbrake Video Transcoding, ScienceMark, and SuperPI]
- Page 6 [Out of the Box Synthetic Gaming Performance: UNIGINE and 3DMark]
- Page 7 [Out of the Box Gaming Performance: Resident Evil, Tomb Raider, GTA:V, Ashes of Singularity]
- Page 8 [Clock for Clock Performance: CINEBENCH, wPrime, and AIDA64]
- Page 9 [Clock for Clock Performance: Handbrake Video Transcoding, ScienceMark, and SuperPI]
- Page 10 [Clock for Clock Synthetic Gaming Performance: UNIGINE and 3DMark]
- Page 11 [Clock for Clock Gaming Performance: Resident Evil, Tomb Raider, GTA:V, Ashes of Singularity]
- Page 12 [Overclocking and Power Consumption]
- Page 13 [What's Hot, What's Not & Final Thoughts]