Introduction and Hardware Modes
Threadripper launched a little over a month ago, and since launch, we have had time to play around with AMD's new beast of a CPU. Threadripper CPUs connect two dies together with an Infinity Fabric that operates over 100GB/s bi-directionally. This die to die connection is what makes it possible for Threadripper CPUs to carry so many cores and threads and at an affordable price point, but they also introduce some variability when it comes to performance.
AMD counters this by adding in additional hardware modes, which can easily be accessed through Windows or BIOS, and they can have a significant enough impact on performance depending on what tasks the CPU is handling. We will test out Threadripper's hardware modes, more specifically memory access modes (Local and Distributed), SMT, and the unique Legacy Mode.
AMD has given us a decent amount of information on the memory access and Legacy modes, and we all know what SMT does (two threads per core). AMD actually provides two basic modes of operation; a Creator Mode (default) and a Gaming Mode, and today we will evaluate them and all the other possible hardware modes.
Hardware Modes
Threadripper allows you to configure memory access modes in either Distributed mode (Uniform Memory Access/UMA) or Local mode (Non-Uniform Memory Access /NUMA). UMA mode evenly distributes memory transactions across all memory channels, and it provides higher bandwidth but also higher latency. NUMA mode allows individual dies to prioritize memory access to cores nearest to physical DIMMs, and while this results in slightly lower bandwidth, it also lowers latency.
Memory access modes here can have a big impact, especially when it comes to switching between gaming and content creation. In Local mode, AMD says that the OS is hinted that an application should stay in one die until it's full, which should increase performance for those types of loads.
Simultaneous Multi-Threading (SMT) is what produces 32 threads from 16 cores. It allows the parallel assignment of two threads to a single core. SMT is excellent when you need more threads, but when the workload actually only uses about half or less of your available threads, it can help a lot to turn it off and allow one core per thread. For many years it was known that SMT or what Intel calls Hyper-Threading (HT), wasn't helpful in gaming, and could be counterintuitive. We also know from Ryzen launch that turning SMT off can produce gains in games and some applications, so we will investigate that here as well.
You can toggle SMT, memory modes, and a Legacy Compatibility mode inside the Ryzen Master Utility. All these modes require a restart, but with the speed of NVMe drives and three M.2 ports, a restart should be the least of your worries (except for Windows Update). Legacy compatibility mode does something different; it disables one of the two dies but maintains quad channel memory support, and puts the CPU into NUMA memory access mode.
AMD found that this mode was best for many games, and it puts the Threadripper 1950X on par with the 1800X when it comes to performance. It also helps to avoid issues with games or programs that might not be able to use more cores.
AMD did put out its internal results from UMA vs. NUMA memory access modes at 3200MHz, and we can see that while latency goes down in NUMA mode, so does bandwidth. AMD also made a chart of a few games and what they prefer, as depending on what they prefer, they could like NUMA over UMA (Lower memory latency), SMT ON vs. SMT OFF (Higher core counts), or Legacy Mode (Lower core-to-core latency).
Let's investigate!
Test Setup
- CPU: AMD Ryzen Threadripper 1950X
- Motherboard: ASRock X399 Professional Gaming (BIOS 1.3)
- Cooler: Enermax Liqtech TR4 360mm
- Memory: G.Skill TridentZ RGB 3200MHz 4x8GB (at stock)
- Video Card: NVIDIA GeForce GTX 1080 Ti Founder's Edition - Buy from Amazon / Read our review
- Storage - Boot Drive: Samsung 950 Pro and 960 Pro
- Storage - USB Drive: Corsair Voyager GS 64GB - Buy from Amazon / Read our review
- Case: Corsair Obsidian 900D - Buy from Amazon / Read our review
- OS: Microsoft Windows 10 - Buy from Amazon
- Monitor: ASUS PA328 ProArt 32" 4K - Buy from Amazon
- Keyboard: Corsair K70 LUX - Buy from Amazon
- Mouse: Corsair M65 PRO RGB - Buy from Amazon / Read our review
- Headset: Corsair VOID RGB Wireless - Buy from Amazon / Read our review
The CPU was run in out-of-the-box configuration, meaning I didn't increase memory speeds or alter CPU speeds or settings past what the UEFI sets at default. The CPU was tested in five different modes. The first mode is 1950X Legacy Mode SMT ON/NUMA, which results in 8C/16T, uses local memory access (quad channel), and leaves SMT on (Gamer Mode in Ryzen Master).
The second mode is 1950X SMT OFF/UMA, which is 16C/16T with distributed memory access. The third mode is 1950X SMT OFF/NUMA, which is 16C/16T with local memory access.
The fourth mode is 1950X SMT ON/NUMA, which is 16C/32T with local memory access. The fifth mode is 1950X SMT ON/UMA, which is 16C/32T with distributed memory access, and it's also the default mode the CPU operates in (Creator Mode in Ryzen Master).
CINEBENCH, wPrime, and AIDA64
The CPU was tested in five different modes. The first mode is 1950X Legacy Mode SMT ON/NUMA, which results in 8C/16T, uses local memory access (quad channel), and leaves SMT on (Gamer Mode in Ryzen Master).
The second mode is 1950X SMT OFF/UMA, which is 16C/16T with distributed memory access. The third mode is 1950X SMT OFF/NUMA, which is 16C/16T with local memory access. The fourth mode is 1950X SMT ON/NUMA, which is 16C/32T with local memory access.
The fifth mode is 1950X SMT ON/UMA, which is 16C/32T with distributed memory access, and it's also the default mode the CPU operates in (Creator Mode in Ryzen Master).
UMA and NUMA modes have very little impact in the rendering benchmark CINEBENCH R15; the only surprise is that we find single core performance drops slightly with SMT OFF and UMA. I didn't expect to find any deviation in wPrime with UMA and NUMA modes, but I did find that with SMT OFF, NUMA was a bit faster than UMA. The opposite is true with SMT ON. However, the differences with SMT ON are within the margin of error.
AIDA64's FPU tests seem to not care at all about memory access, and they don't care about threads, but they do scale perfectly with core count, as that is the only difference between Legacy mode and the other four modes.
AIDA64's memory latency results show that NUMA has significantly lower latency, but bandwidth results show that UMA provides less bandwidth than NUMA, except in memory writes (this is the opposite of what we predicted).
Perhaps it's our out of the box testing method where the memory is running 2133MHz, or AIDA64 updated the program (they make major changes), but we didn't find that UMA provided higher memory bandwidth, although we did find NUMA did provide significantly lower latency.
Handbrake Video Transcoding, ScienceMark, and SuperPI
Handbrake, SuperPI, and ScienceMark
The CPU was tested in five different modes. The first mode is 1950X Legacy Mode SMT ON/NUMA, which results in 8C/16T, uses local memory access (quad channel), and leaves SMT on (Gamer Mode in Ryzen Master).
The second mode is 1950X SMT OFF/UMA, which is 16C/16T with distributed memory access. The third mode is 1950X SMT OFF/NUMA, which is 16C/16T with local memory access. The fourth mode is 1950X SMT ON/NUMA, which is 16C/32T with local memory access.
The fifth mode is 1950X SMT ON/UMA, which is 16C/32T with distributed memory access, and it's also the default mode the CPU operates in (Creator Mode in Ryzen Master).
When I initially tested the 1950X, I noticed that handbrake wasn't taking advantage of all threads and cores as it might on a much lower count CPU. That, of course, differs with what you do in handbrake, whether it be changing from one codec to another, from one resolution to another, or just tuning quality and the like, the performance differences will differ. We test using the "Normal Profile, " and in our UHD tests we take a 4K video and turn it into a 1080P video, and in our 720P tests, we change the codec of a 720P video.
HandBrake in UHD and 720P loves the 1950X with SMT disabled, but with distributed memory mode. Disabling half the CPU cores also doesn't cut performance in half, but rather reduces it by roughly 25%.
ScienceMark did show that SMT OFF/NUMA mode had the best memory performance and overall score in the benchmark, SMT OFF/NUMA is the opposite configuration to what is the default configuration (SMT ON/UMA).
SuperPI 32M did best with NUMA mode (lower latency) and with SMT ON, even better than with SMT OFF in the same NUMA memory mode. However, in UMA memory mode, the CPU did better in SuperPI with SMT OFF.
UNIGINE and 3DMark
The CPU was tested in five different modes. The first mode is 1950X Legacy Mode SMT ON/NUMA, which results in 8C/16T, uses local memory access (quad channel), and leaves SMT on (Gamer Mode in Ryzen Master).
The second mode is 1950X SMT OFF/UMA, which is 16C/16T with distributed memory access. The third mode is 1950X SMT OFF/NUMA, which is 16C/16T with local memory access. The fourth mode is 1950X SMT ON/NUMA, which is 16C/32T with local memory access.
The fifth mode is 1950X SMT ON/UMA, which is 16C/32T with distributed memory access, and it's also the default mode the CPU operates in (Creator Mode in Ryzen Master).
With NUMA mode seemed to increase overall score in FireStrike, and while Legacy Mode did heavily increase GPU score, it wasn't enough to make up for the decrease in Physx score. In Cloud Gate, Legacy mode increased GPU score by so much that it won overall score.
The two 3DMark benchmarks act like different types of games and reveal that in some cases, AMD's Legacy mode (Gamer Profile) really can increase performance, but in other cases, it doesn't. UNIGINE once again revealed that removing that die to die latency provides better gaming performance, even with the loss of half the CPU's cores. UNIGINE also seemed to prefer NUMA over UMA.
Resident Evil, Tomb Raider, GTA:V, Ashes of the Singularity
The CPU was tested in five different modes. The first mode is 1950X Legacy Mode SMT ON/NUMA, which results in 8C/16T, uses local memory access (quad channel), and leaves SMT on (Gamer Mode in Ryzen Master).
The second mode is 1950X SMT OFF/UMA, which is 16C/16T with distributed memory access. The third mode is 1950X SMT OFF/NUMA, which is 16C/16T with local memory access. The fourth mode is 1950X SMT ON/NUMA, which is 16C/32T with local memory access.
The fifth mode is 1950X SMT ON/UMA, which is 16C/32T with distributed memory access, and it's also the default mode the CPU operates in (Creator Mode in Ryzen Master).
Resident Evil really likes the Legacy Mode within the Gamer Profile in Ryzen Master. NUMA mode also helps with Resident Evil 6. We see that GTA:V also likes the Game Profile as well. Rise of the Tomb Raider shows that the Gamer Profile (with Legacy mode) does well at lower resolutions, however, at 4K, both average and minimum frame rates were slightly higher with SMT OFF and NUMA mode.
Finally, we have Ashes of the Singularity: Escalation, and to be clear we used v2.4, which is newer than v2.3 used for the launch review. In fact, I couldn't even get Ashes to work with Gaming Profile (Legacy mode) with the previous version, but now it does work fine. However, while Ashes loves cores, it seems to dislike NUMA mode, and I even saw reverse scaling with SMT OFF and NUMA mode.
All of this means that your game might or might not like hardware mode changes, but the good news is that Gamer Mode is almost always either better or worse than default mode, so it's easier to pinpoint which is better.
Take Away
We all know that SMT can have a major impact on heavily threaded applications, and it's a godsend, but on the flip side, when applications aren't that heavily threaded it can be beneficial to turn it off. The benchmarks with heavily threaded applications show its benefits, and other applications such as some games and even Handbrake in certain configurations do better with SMT off. The results from UMA/NUMA do in fact show the benefits of AMD adding in a second option to make the memory more easily accessible to physically local cores, and the impact on memory latency is huge compared to the difference in memory bandwidth.
There are still applications that benefit from UMA, such as HandBrake and other content creation software, and even some games prefer distributed UMA mode over local NUMA mode. AMD's Gaming Profile with their Legacy Mode really does make a big difference as well, and we did see its benefits in games.
Threadripper employs multiple dies to create a single CPU, and in doing so introduces us to something we aren't used to, the die to die interconnect. The die to die interconnect does run over 100GB/s bi-directionally, but it also imposes a latency penalty. Threadripper's size and interconnects create a new dilemma we haven't really had to face in the past. AMD clocks near memory latency at 78ns, while far memory latency is 133ns, and that difference is huge.
However, while we are faced with this new dilemma, it's one that we will become more familiar with as time progresses, and we aren't able to shrink cores anymore. Multiple die CPUs are going to be part of the future of HEDT platforms, and I have to say that AMD has done a great job with tackling the dilemma. AMD didn't think in the box, but rather out of the box, and gave the consumer the tools to tune their CPU to the maximum.
AMD took the lead and created an easy to use software application which can not only control SMT, but also memory access, and offers a special mode for gaming. AMD's Ryzen Master program is more useful than most people thought, and the difference it can make in real-world performance can't be ignored.
The good news here is that AMD has created, unlocked, and shown us the tools to enhance the Threadripper experience. If you are a crazy tweaker, then it's up to you to play around and see what modes and combinations your programs enjoy the most.