The Bottom Line
Introduction and Quick Specs
Intel is expanding its NVMe PCIe portfolio yet again with the launch of the Data Center P3608 SSD, Intel's first Gen 3 x8 lane NVMe PCIe SSD. The DC P3608 is two NVMe SSD's in one. The drive can be RAIDed into one super high-performance NVMe volume, or used as two separate high-performance NVMe SSDs. That gives it the advantage of occupying a single PCIe slot and the ability to simultaneously transfer data independently of one another.
Intel is aiming its newest SSD at meeting the need for high-performance storage in today's datacenter. The DC P3608 leverages the power of NVMe over the PCIe interface to deliver ultra-low latencies, by moving data closer to the CPU and reducing CPU overhead with a streamlined command stack. The DC P3608 is about more than just a new level of performance; it's also about higher capacity. The DC P3608 is available with twice the maximum capacity of Intel's previous NVMe datacenter SSDs. To make upgrading easy, the DC P3608 Series SSDs can be deployed right out of the box because the drive uses industry standard form factors and works with industry standard NVMe software and drivers.
Sporting an endurance rating of three drive writes per day for five years, the DC P3608 is a perfect drop in solution for high performance mixed workload environments. The DC P3608 is available in three capacities: 1.6TB, 3.2TB, and 4TB with endurance ratings of 8.76, 17.52, and 21.90 Petabytes Written (PBW). The 1.6TB model we are testing today delivers the best random performance of the three capacity points. The 3.2TB model, the best-mixed performance, and the 4TB model the highest density and bandwidth. The main performance differentiator between the three capacities is the amount of overprovisioning employed.
The DC P3608 is Intel's first NVMe SSD with dual controller architecture. A single DC P3608 can simultaneously process commands with separate queues, dynamically distributing IO evenly over multi-core Xeon processors. The DC P3608 can also be aggregated into a single ultra-high performance RAID 0 volume via Intel's enterprise Rapid Storage Technology (RSTe) 4.3 for NVMe virtualized controller technology. This new RSTe driver has us excited because it should deliver far better performance than creating a RAID volume with MS software RAID or Windows Server Storage Spaces.
Intel states the DC P3608 is in production now and shipping to customers in high volume.
Quick Specs
The first thing that jumps out at us is the DC P3608's incredible sequential read speed of 5000 MB/s, the highest sequential read speed for any SSD we've reviewed. Next is the drive's 850,000 IOPS 4K random read performance, which is again the highest for any SSD we've reviewed to date. All three capacities deliver this incredible read performance. As mentioned, the 1.6TB model we have on the bench today delivers the best random write performance of the three capacity points, delivering up to 150,000 IOPS 4K random write performance. The DC P3608 employs Intel's High Endurance Technology (HET) NAND Flash. HET NAND Flash coupled with an advanced LDPC (Low-Density Parity Check) ECC engine enables up to 21.9 PB of endurance.
Intel DC P3608 1.6TB Enterprise PCIe NVMe SSD Photos and Specifications
Typically we will disassemble the test subject to get a look at the drive's internals. Intel's DC PCIe AIC SSDs are one of the rare exceptions to this practice. In the case of the DC P3608 AIC, the drive's heat sink and protective back plate/heat shield are attached with a thermal adhesive rather than a thermal pad or thermal paste. We know from experience disassembling this drive involves the risk of damaging components, and we are unwilling to take that risk.
We do know what is under the covers, so we will give you a list of the major components that reside on the 1.6TB DC P3608's PCB. There are two Intel CH29AE41AB1 18-channel controllers, 18 Intel 128GB 20nm HET flash packages, 10 Micron 512MB DDR3 1600MHz DRAM packages, and a PLX PEX8718-AB80BI G 16-lane Gen 3 PCIe switch.
The entire front half of the drive's PCB is covered with a full-length solid aluminum heat sink. All four of the drive's capacitors are visible. We also see the drive's eight-lane PCIe Gen 3 edge connector. There is a sheet aluminum face plate with emblazoned with an Intel logo and trademark swoop. The model of the drive is advertised here as well.
The back of the drive is covered with a protective sheet aluminum cover plate. This cover plate is attached with screws and thermal adhesive. The thermal adhesive is the reason we chose not to remove the back plate for our photos.
From this angle, we can see there are two separate heat sinks, in addition to the full-length heat sink. These separate heat sinks are mounted on the drive's dual controllers to provide more efficient cooling.
From this angle, we can see the honeycomb perforated full height bracket. The drive ships with a half height and full height bracket. Intel specs an airflow rate of 400LF flow from the back of the drive and out through this bracket to provide adequate cooling. If you look closely, you can see four led drive status indicators through the openings in the bracket.
From this angle, we can see the channels in the heat sink where airflow is directed through to the perforated mounting bracket.
There is a manufacturer label affixed to the top edge of the drive's main heat sink. The label lists the drives model, capacity, serial number, shipping firmware, and a warning that the warranty will be voided if any screw or label is removed.
Specifications and Features Intel DC P3608 1.6TB Enterprise PCIe NVMe SSD
We are going to go over the 1.6TB DC P3608's specifications, for other capacities please refer to the above sheet. The Intel DC P3608 1.6TB Enterprise PCIe NVMe SSD comes in a half-height, half-length, low-profile Add-in Card (AIC) Form factor.
Features include a PCIe Gen 3.0 x8 interface, and Intel 20nm HET (High Endurance Technology) MLC NAND flash memory. End-to-End data protection featuring XOR parity protection and advanced LDPC ECC bit correction on all internal and external memories in the data path for protection at every layer. Enhanced power loss management including protection from unplanned power loss called PLI (Power Loss Imminent) by utilizing a propriety combination of hardware, firmware algorithms, built-in self-test, and robust validation.
DC P3608 1.6TB sustained performance (RAID 0 volume) specs:
- Sequential 128KB QD128 Read (up to): 5000 MB/s
- Sequential 128KB QD128 Write (up to): 2000 MB/s
- Random 4KB QD256 Read (up to): 850,000 IOPS
- Random 4KB QD256 Write (up to): 150,000 IOPS
- Seq. Latency R/W: 20/20us (typical)
Reliability: MTBF one million device hours, UBER: 1E-17, Silent bit error rate of 1E-25, End-to-end data protection, Enhanced power-loss data protection, SMART monitoring and T-10 DIF Protection.
Endurance: PBW (Petabytes Written): 1.6TB = 8.76 PBW equivalent to three drive writes per day for five years.
Test System Setup and Testing Methodology
Jon's Enterprise SSD Review Test System Specifications
- Motherboard: ASRock Rack EPC612D8A-TB (Intel C612 chipset) - Buy from Amazon
- CPU: Intel Xeon E5-2698 V3 - Buy from Amazon
- Cooler: Supermicro Air Cooling
- Memory: Samsung 64GB DDR4 ECC 2133MHz - Buy from Amazon
- Video Card: Onboard Video
- Power Supply: Seasonic Platinum 1000 Watt - Buy from Amazon
- OS: Microsoft Windows Server 2012 R2 - Buy from Amazon
- Drivers: Microsoft AHCI
- Drivers: Intel RSTe 4.3 for NVMe
We would like to thank ASRock Rack, Crucial, Intel, Samsung, Seagate, and Seasonic for making our test system possible.
TweakTown's Enterprise SSD testing methodology replicates enterprise environments as closely as possible. Our test systems use strictly enterprise based hardware. Enterprise chipsets, Intel Xeon processors, ECC DRAM, and standard air-cooling. Storage drivers are Windows standard drivers, except as otherwise required for the test device to operate as designed.
TweakTown strictly adheres to industry-accepted Enterprise Solid State Storage testing procedures. Each test we perform repeats the same sequence of the following four steps:
- Secure Erase SSD
- Write entire capacity of SSD a minimum of 2x with 128KB sequential write data, seamlessly transition to next step
- Precondition SSD at maximum QD measured (QD32 for SATA, QD256 for PCIe) with the test specific workload for a sufficient amount of time to reach a constant steady-state, seamlessly transition to next step
- Run test specific workload for 5-minutes at each measured Queue Depth, record results
We chart workload preconditioning IOPS or MB/s and latency for each specific test. We plot workload preconditioning using scatter charts with each recorded 1-second data point represented on the chart, allowing us to see some of the performance variability exhibited by our test subjects. We chart workloads using line charts plotting average workload IOPS or MB/s and latency at each measured QD. Utilizing line charts provides a good visual perspective of the test subject's performance curve.
To summarize, we test with Enterprise hardware, Windows Server Operating System, and we strictly adhere to industry-accepted Enterprise SSD testing procedures. Our goal is to provide results that are consistent, reliable, and repeatable.
Intel's given performance specifications are for a RAID 0 volume on the drive. As mentioned, the DC P3608 is two SSDs on one card. If utilized as two separate volumes, these performance figures will be cut approximately in half. With this in mind, we will be testing the DC P3608 two ways, as a 1.6TB RAID 0 volume and as an 800GB single volume. Until now, the only way to RAID PCIe SSDs was to use MS disk striping or to create a Storage Spaces virtual volume with Windows Server OS. These methods do not allow the user to configure stripe sizes. Stripe sizes play a key role in creating an array that will deliver maximum performance based on workload applications. Intel has changed that with the introduction of RSTe 4.3 for NVMe. This new driver/control panel allows NVMe SSDs to be configured in an array with various stripe sizes to suit the end-user's particular needs.
Here is how we created a RAID 0 array on the DC P3608 for this review:
Select NVMe Devices Controller. Select RAID 0.
Select both NVMe SSDs that comprise the DC P3608. The standard setting is for a 95% array allocation, but we chose 100% allocation because the want to test the entire LBA span of our RAID volume.
We chose 128 KB stripes for our volume, which is the default, but you can choose stripe sizes from 4 KB - 128 KB.
Our drive now shows up as a RAID volume.
A quick check of device manager lists an RSTe Virtual Controller after creating a RAID 0 volume on the drive. Before getting into our steady-state testing, we decided to run a couple of quick benches on our empty RAID 0 volume to verify we were getting speeds that meet or exceeded Intel's specifications:
984,000 4K Random Read IOPS; that exceeds Intel's 850,000 IOPS specification by quite a bit, which is what we would expect to see from an empty FOB volume.
7,100 MB/s Sequential Read speed also exceeds Intel's 5,000 MB/s specification by a large margin; again, this is what we would expect to see from an empty FOB volume. Let's get into the review and see what sustained performance (steady-state) looks like.
Benchmarks - 4K Random Write/Read
4K Random Write/Read
We precondition the drive for 16,000 seconds, or 4.44 hours, receiving performance data every second. We plot this data to observe the test subjects descent into steady state. We plot both IOPS and Latency. We plot IOPS (represented by blue scatter) in thousands and Latency (represented by orange scatter) in milliseconds. We observe steady state achieved at 3,000 seconds of preconditioning.
With our configuration, we exceeded Intel's 4K random write specification of 150,000 IOPS. At QD64, we hit the specified 150,000 IOPS. We saw the highest performance at QD128. At QD256, the DC P3608 is delivering 156,000 IOPS. As a separate (one of two) 800GB non-RAID volume, the DC P3608 is generating 79,000 4K Random Write IOPS @ QD256. The DC P3608 easily outperforms the Samsung XS1715 in RAID mode.
4K random read performance of the competing drives are much closer to one another than 4K random write. The Samsung XS1715 displays better random read performance from QD8-QD64. At QD128, the DC P3608 takes a small lead over the XS1715. At QD256, the DC P3608 generates 857,000 IOPS, exceeding specification and delivering 105,000 IOPS more than the Samsung XS1715.
Conclusion: The DC P3608 in RAID mode delivers higher 4K random performance than Samsung's XS1715.
Benchmarks - 8K Random Write/Read
8K Random Write/Read
We precondition the drive for 16,000 seconds, or 4.44 hours, receiving performance data every second. We plot this data to observe the test subjects descent into steady state. We plot both IOPS and Latency. We plot IOPS (represented by blue scatter) in thousands and Latency (represented by orange scatter) in milliseconds. We observe steady state at 3200 seconds of preconditioning.
8K random is a more demanding workload than 4K. The DC P3608 in RAID mode is exceeding Intel's 8K random write 60,000 IOPS specification across the board. At QD256, the DC P3608 is delivering 45% more performance than the XS1715. The XS1715 gets hammered particularly hard, and its 8K random write performance decreases as the queue depth increases. In non-RAID mode, a single 800GB volume on the DC P3608 is performing at about 50% of the RAID 0 volume.
Intel specs the DC P3608 as capable of up to 500,000 8K random read IOPS. We hit 497,000 with our system configuration. The XS1715 is delivering slightly better performance than the DC P3608 at QD8-QD32. From QD32-256, the DC P3608 in RAID mode takes charge. At QD256, the DC P3608 is pushing out over 100,000 more IOPS than the XS1715.
Conclusion: The DC P3608 in RAID mode delivers higher 8K random performance than Samsung's XS1715.
Benchmarks - 128K Sequential Write/Read
128K Sequential Write/Read
We precondition the drive for 6,500 seconds, or 1.8 hours, receiving performance data every second. A sequential steady state is achievable in a much shorter span of time than a random steady state. We plot both MB/s and Latency. We plot MB/s using blue scatter and Latency using orange scatter. We observe that the DC P3608 achieves steady state at 0 seconds of preconditioning, indicating that the previous 2x LBA fill phase achieved a sequential steady state.
Sequential write performance of the DC P3608 in RAID mode comes in slightly below the up to 2000 MB/s specification given by Intel. The peak for us was 1,970 MB/s at QD16. From QD16 - 256, the DC P3608 follows a slight downward trend ending up at 1,863 MB/s at QD256. As a single non-aggregated 800GB volume, the drive follows a less pronounced downward trend as QD increases. The opposite is true for the XS1715. It follows a more conventional trend, peaking at 1400 MB/s at QD256.
When we first saw the specifications given for the DC P3608, two figures given immediately grabbed our attention. First, the drive's random 4K read IOPS of 850K, the other, its sequential read speed of 5GB/s. The 850K random 4K read spec has proven to be spot on, and as you can see, the drive's 5GB/s sequential read specification is as well.
At QD128, the DC P3608 is 75% faster than Samsung's venerable XS1715. As we pointed out earlier, striping makes a big difference in a RAID volume. Intel's RSTe for NVMe driver and control panel gives us the ability to customize stripe sizes. The reason I point this out again is because we chose 128 KB stripes for our volume. 128 KB is the largest stripe size, and it is well suited to deliver the best sequential speed, although, typically at the sacrifice of some random performance. In RAID 0 mode, with 128 KB stripes, the DC P3608 is delivering OVER DOUBLE the sequential read performance of a single non-aggregated 800GB volume.
Conclusion: At this point, it's clear that the DC P3608 in RAID 0 mode is much faster than the competition.
Mixed Workload Benchmarks - Email Server
Email Server
We precondition the drive for 16,000 seconds, or 4.44 hours, receiving performance data every second. We plot this data to observe the test subjects descent into steady state. We plot both IOPS and Latency. We plot IOPS (represented by blue scatter) in thousands and Latency (represented by orange scatter) in milliseconds. Steady state is achieved at about 4,500 seconds.
An Email Server workload is a demanding 8K test with a 50 percent R/W distribution. This application gives a good indication of how well a drive will perform in a write heavy workload environment.
The DC P3608 in RAID mode is delivering astounding performance in this brutal test. At QD256, the DC P3608 1.6TB is 34% faster than the Samsung XS1715, which is a powerhouse itself. To give you an idea of how good this is, 125,000 IOPS in this test is roughly 10x higher than a typical SATA-based enterprise SSD.
Conclusion: Intel states that the DC P3608 can deliver 10x the performance of a SATA-based SSD, and we find this to be true.
Mixed Workload Benchmarks - OLTP/Database
OLTP/Database
We precondition the drive for 16,000 seconds, or 4.44 hours, receiving performance data every second. We plot this data to observe the test subjects descent into steady state. We plot both IOPS and Latency. We plot IOPS (represented by blue scatter) in thousands and Latency (represented by orange scatter) in milliseconds. Steady state is achieved at 4,500 seconds.
An On-Line Transaction Processing (OLTP) / Database workload is a demanding 8K test with a 66/33 percent R/W distribution. OLPT is online processing of financial transactions such as credit cards and high-frequency trading in the financial sector. Database workloads are challenging for any storage solution.
The DC P3608 1.6TB dominates this test too. The DC P3608 1.6TB again outperforms the XS1715 by 34% at QD256. Intel specs the DC P3608 for up to 160,000 random 8K performance with a 70/30 read/write distribution. We do not test at 8K 70/30, but this test is very close to that. We believe that this test, which is a little more demanding than 8K 70/30, indicates that the DC 3608 1.6TB would indeed achieve the specified 8K 70/30 random performance of 160K IOPS.
Conclusion: The DC P3608 1.6TB R0 handles demanding 8K workloads with a new level of performance.
Mixed Workload Benchmarks - Web Server
Web Server
We precondition the drive for 16,000 seconds, or 4.44 hours, receiving performance data every second. We plot this data to observe the test subjects descent into steady state. We plot both IOPS and Latency. We plot IOPS (represented by blue scatter) in thousands and Latency (represented by orange scatter) in milliseconds. We precondition for the 100% random read Web Server testing with an inverse 100% random write workload. In 4,000 seconds, steady state is achieved.
The Web Server workload is a pure random read test with a wide range of file sizes. Our test consists of the following file sizes and corresponding percentage of the overall 100 percent workload file size: 512B = 22 percent, 1KB = 15 percent, 2KB = 8 percent, 4KB = 23 percent, 8KB = 15 percent, 16KB = 2 percent, 32KB = 6 percent, 64KB = 7 percent, 128KB = 1 percent, and 512KB = 1 percent.
The XS1715 outperforms the DC P3608 1.6TB by a slim margin at QD8-16. The DC P3608 1.6TB takes command at QD64, and by QD256 it outperforms the venerable Samsung XS1715 by 71,000 IOPS or 39%. At over 250,000 IOPS, the DC P3608 is again delivering 10x the performance of a typical SATA-based SSD in a Web Server environment. As a RAID 0 volume, the DC P3608 1.6TB is delivering a little over twice the performance of an 800GB non-RAID volume.
Conclusion: Stripe sizes matter and our chosen stripe size suits this workload very well as evidenced by a 100% increase (over a single volume) in workload performance at QD256.
Final Thoughts
The Intel SSD DC P3608 Series NVMe SSD on eight lanes of PCIe 3.0 is Intel's most powerful datacenter SSD. The progression to an eight-lane Gen 3 NVMe PCIe card is a natural one because current high-end enterprise SSDs are capable of exceeding the available bandwidth of four-lane PCIe Gen 3. The DC P3608 has indeed brought with it a new level of performance.
We like the versatility that a dual controller architecture brings to the table, and what we like the most about it is the speed available when both internal SSDs become aggregated into an RSTe NVMe RAID 0 volume. As fast as storage has become, it is still invariably the system bottleneck. During a pre-release press briefing, Intel representatives stated that the P3608 is already being utilized in high-performance computing applications. Intel stated that clients have reported that the DC P3608 is the fastest flash-based storage device they have ever used. After testing the drive for ourselves, we tend to agree.
As we ran the DC P3608 through our enterprise test suite, we were able to match or exceed Intel's performance specifications. It was not without a little trial and error, though. We quickly found out that if you plan on hitting 850,000 4K random read IOPS in a steady state and want to do it with less than 90% CPU utilization, you are going to need more than eight CPU cores. We ended up utilizing a 16 core Xeon E5 2698 V3 to get the job done. With 32 CPU threads available, the DC P3608 was totally unleashed as you can tell from our nearly one million IOPS test run with 43% CPU utilization.
Overall, we are blown away by the performance the P3608 can generate. The drive is quite literally a game changer. The DC P3608 is more than just a performance powerhouse; it's an Intel SSD which means it is built with the very best quality and engineered for reliability that is second to none. The DC P3608 1.6 TB is set to retail for $3509, which is an outstanding value for an enterprise SSD with this level of performance, reliability, and endurance.
Pros:
- Highest in class performance
- Versatility
- Data Protection Scheme
Cons:
- High CPU utilization with eight-core CPUs