On-Die Interconnects Enhancements and Die Configurations
As the core counts started to increase past 12 cores, and more cache added, a more efficient method of connecting all the cores is necessary. The E5-2600 v3's use new ring style interconnects that have bi-directional buffered connections for both rings.
This die configuration chart shows how all cores can communicate with each other. These interconnects allow for faster core communication with a more direct link using dual rings with bi-directional interconnects. These interconnects are also buffered to improve performance.
Integrated Voltage Regulators
Integrated voltage regulators (IVR) have been simplified, which reduces platform complexity by reducing rails and integration of control.
It also enables a more refined voltage and frequency granularity, faster transitions between power states, and reduces board area to enable factor optimizations.
Turbo and AVX Improvements
Intel Turbo Boost Technology 2.0 will automatically allow processor cores to run faster than the rated and AVX frequencies if they are operating below power, current, and temperature specification limits.
The frequency change with AVX workloads happens when the core detects an AVX instruction; these draw more current, and a higher voltage is needed to sustain these conditions.
The core will signal the Power Control Unit (PCU) to provide more voltage. The core will slow during the execution of the AVX instruction in order to maintain TDP limits, which may cause the frequency to drop. That amount of frequency drop will depend on the workload.
The PCU will signal that the voltage has been adjusted, and cores will return to full execution speed. When finished, the PCU will return to regular (non-AVX) operating modes 1ms after AVX instructions are completed. Turbo state limiting will decrease timing variability and power.
Some HPC software requires limited thread variability, which gives some cluster designers concerns about turbo power surges. To combat this, some disable turbo. Turbo state limiting uniformly caps the maximum number of turbo states for all cores. This provides a predictable range of thread variability and power risk, while allowing some turbo performance benefit.
Cluster On Die (COD) mode
Cluster on Die (COD) is supported on one-socket and two-socket SKU's with two home agents (10+ cores).
COD reduces coherence traffic and cache-to-cache transfer latencies, and targets NUMA (non-uniform memory access) optimized workloads where latency is more important than sharing caching agents. COD is best used for highly NUMA (non-uniform memory access) optimized workloads.
Each Home agent has ~14KB of cache, which is eight-way, 256 sets, and two-sector wide. It stores eight-bit presence vector tracking caching agent, potentially owning a copy a cache line. Allocation on a cache-to-cache transfer and tracks hit-M, hit-E, and hit-S lines, which are hotly contested cache lines.
The result is lower cache-to-cache transfer latencies, and reduced directory updates and reads of hotly contested lines. Snoop traffic is also reduced by sending directed snoops, rather than broadcasting them.
Virtualization (VT-x) Features
The new VM features lower entry/exit latency, which reduces VMM overhead, and increases overall virtualization performance.
VM control structure (VMCS) shadowing enables efficient nested VMM usages, such as manageability and VM protection. Extended page and table (EPT) access/dirty bits enables efficient live migration, and helps SW managed fault tolerant solutions. Intel Cache Allocation Technology (CAT) is now monitored on a per-VM basis. Utilization data allows VM software to make better decisions on workload scheduling and migration.
Advanced Vector Extensions (AVX) 2.0
Advanced Vector Extensions (AVX), has also been updated to AVX2, which now uses 256-bit floating point SIMD instructions. This will allow you to use up to twice the amount of packed data with a single instruction.
AVX2 increases parallelism and throughput in floating point SIMD calculations, and reduces register load. This can be useful for floating-point intensive calculations in multimedia, scientific, financial applications, image & signal processing, and cryptology workloads.
Power Efficiency Improvements
Per-Core P-States (PCPS) allow cores to run at individual frequencies/voltages. Energy efficient turbo mode (EET) monitors stall behavior and increases throughput. Uncore voltage/frequency scaling (USF) in Nehalem would allow cores to turbo up, but uncore would remain at a fixed frequency; Sandy Bridge core and uncore turbo up and down together.
With Haswell-EP, each core and uncore, are now treated independently. Core bound applications can drive frequency higher without needing to increase uncore. LLC/Memory bound applications can drive frequency higher without burning core power.
Intel Cache Monitoring Technology (CMT)
When many VM's are running in a system, the cache can be trashed by what is now called a "noisy neighbor." This VM starts to demand a heavy workload, and has high cache usage. The new demand on the cache starts to degrade performance of VM's running on the same cores/cache. The heavy load of the noisy neighbor starts to degrade the performance of normal acting VMs.
Today, the VMs that require heavy system use are often moved to areas that have the resources to support them, so other normal VMs can continue without being adversely affected by the noisy neighbor.
With Intel Cache Monitoring Technology, the processor is able to detect this, and even move the VM when needed. System Cache can also be partitioned, so the noisy neighbor will have a lesser impact on the VMs around it.
Intel Cache Monitoring Technology (CMT) enables monitoring of last-level cache occupancy on a per-thread/app/VM basis, enabling measurement of application cache sensitivity, profiling, fingerprinting, chargeback models, detection of cache-starved apps/VMs, detection of "noisy neighbors" (which hog the LLC) and advanced cache-aware scheduling policies. The CMT feature is supported on all Xeon E5 v3 SKUs and is enumerated via CPUID.
The DDR4 Difference
The move to DDR4 has many benefits. First, power dropped from 1.5v in DDR3 down to 1.2v with DDR4. There is also smaller page size (1024 -> 512) for x4 devices. This can show a savings of ~2W per DIMM at the wall.
Improved RAS enables better command/address parity error recovery. When multiple DIMMs per channel are installed, DDR4 has higher bandwidth, and increased DIMM frequency.
When 4x DIMMs are installed per CPU, we can maintain higher DIMM speeds. If 8x DIMMs are installed, frequency will drop to the next rated frequency. Larger capacity DDR4 DIMMs will be available so that using four channels will support a larger amount of RAM, at a faster speed.
PRICING: You can find products similar to this one for sale below.
United States: Find other tech and computer products like this over at Amazon's website.
United Kingdom: Find other tech and computer products like this over at Amazon UK's website.
Canada: Find other tech and computer products like this over at Amazon Canada's website.
- Page 1 [Introduction]
- Page 2 [E5-2600 v3 Platform Summary]
- Page 3 [E5-2600 v3 Architectural Overview]
- Page 4 [E5-2600 v3 Architectural Overview Continued]
- Page 5 [Intel Communications Platform]
- Page 6 [Test System Setup]
- Page 7 [System and CPU Benchmarks]
- Page 8 [Memory Benchmarks]
- Page 9 [Power Consumption and Final Thoughts]
Recommended for You
- We at TweakTown openly invite the companies who provide us with review samples / who are mentioned or discussed to express their opinion of our content. If any company representative wishes to respond, we will publish the response here.
Latest News Posts
- Marvel unleashes the official trailer to Black Panther
- Overwatch now has over 35 million players
- Battlegrounds' update adds vaulting, climbing to testers
- DOOM will be out on Nintendo Switch on November 10
- AMD Ryzen APU spotted with Radeon Vega M graphics
- COLORFUL iGame GeForce GTX 1080 Vulcan X OC Review
- South Park: The Fractured But Hole Review
- NVIDIA locks overclocking in GeForce GTX 1070 Ti
- MSI GeForce GTX 1080 Ti GAMING X TRIO: Unboxed
- Fans keep spinning after shutdown (X299 UD4)
- G.SKILL Releases Fastest 32GB (4x8GB) Trident Z RGB Memory Kit at DDR4-4266MHz
- Seagate To Participate In Consortium Led By Bain Capital Private Equity To Acquire Toshiba Memory Corporation
- ASUS Announces Z370 Series Motherboards
- New Forces Join Popular Team Group Gaming T-FORCE Series
- ECS Elitegroup Announces Z370-Lightsaber Motherboard