Micron's release of the P420m introduces a product that cuts through the cost barriers of PCIe SSD adoption. The advantages of flash, and primarily PCIe SSDs, are finally becoming crystal clear for the majority of users. The primary barrier remains the high acquisition cost of PCIe solutions, even when considering the massive long-term TCO advantage. The original PCIe SSD from Micron, the P320h, redefined the performance envelope for PCIe flash solutions. However, the use of high-cost SLC, though tremendously resilient to heavy workloads, relegates the P320h to upper-tier applications.
While the high-performance segment addressed by the P320h is large, the mainstream market is exponentially larger. Many might even argue that the P320h was the right solution for the market at the time, when many mainstream users were still wary of flash solutions. With its extreme endurance and high tolerance for heavy workloads, the P320h proved that bulletproof solutions for flash acceleration could gain significant penetration into the datacenter.
The flash market has matured considerably since then. The rate of adoption is staggering and at times the fabs cannot even produce NAND fast enough to satiate demand. If ever there were a time for a low cost solution that can deliver tremendous performance in a wide variety of use cases, the time is now.
The P420m looks to provide this low-cost MLC solution with higher capacities, yet also provide tremendous performance for the mainstream market. The common use-cases for the P420m run the gamut of mainstream applications from caching and OLTP to media streaming and web acceleration.
Micron leverages their vertical integration by utilizing their own 25nm NAND, DRAM, and Micron-developed application-specific integrated circuit (ASIC). The card comes in two form factors, the typical PCIe Gen2 x8 half-height half-length (HHHL), and a 2.5" form factor.
The slim profile of the HHHL P420m is designed for use in rackmount servers and is available in capacities of 700GB and 1.4TB. Both capacity points provide up to 3 GB/s of sequential read and up to 750,000 random read IOPS. The trade-off in the move to MLC is in write performance; the 700GB P420m offers 50,000 random write IOPS and 600 MB/s in sequential write. The 1.4TB unit provides 95,000 random write IOPS and 630 MB/s of sequential write speed.
The 2.5" P420m is the world's first PCIe Gen2 x8 MLC NAND PCIe SSD available in a SFF. These SSDs are hot pluggable and designed for utilization in the front bay of a server, thus reducing downtime and providing easy accessibility. The 2.5" comes in capacities of 350GB and 700GB, with sequential read performance up to 1.8 GB/s and random read performance topping out at 430,000 IOPS.
Endurance is always a concern in the datacenter, and is especially relevant with MLC solutions. To provide increased endurance, Micron implanted their XPERT (eXtended Performance and Enhanced Reliability Technologies) suite into a PCIe card for the first time with the P420m. This mix of technologies increases SSD performance and reliability, most notably through a RAID 5-like redundancy scheme (RAIN). We will cover this in detail in the following pages.
The XPERT suite provides an endurance of 5PB for the 700GB and 10PB for the 1.4TB, in stark contrast to the 50PB with the P320h. The lower endurance is enough for four drive fills per day for five years. The P420m also brings the inclusion of power hold-up capacitors as an additional layer of data protection in the event of power fail, a feature that was noticeably absent on the P320h.
Micron P420m Specifications and RealSSD Manager
Micron P420m Specifications
The HHHL cards do not require external power connections to power the controller, easing cabling requirements, and keeping heat output low with an economical power draw of 22W RMS for the 700GB and 30W RMS for the 1.4TB. The 1.4TB capacity point can be limited to 25W via a power-throttling feature. The 2.5" 350GB pulls 14W and idles at 7W, while the 700GB consumes 22W active and 8W idle.
Airflow is spec'd at 1.5 M/s at an operating temperature of 0 to 50C. In the event of temperatures exceeding this threshold, thermal throttling kicks in until temperatures fall into an acceptable range. The cards feature an MTTF of 2.0 million hours, and an Uncorrectable Bit Error Rate (UBER) of 1 per 10E17.
The inclusion of power capacitors flushes data to the lower pages of the NAND in the event of a host power-loss event.
An important facet of any product is ease of use. One of the greatest advantages of the P420m is the ability to simply slip in a card that provides huge increases in performance. Managing the card begins after installation, and over the service life of the unit an easy to use management interfaces is key.
The P420m is managed through the RealSSD Manager (RSSDM) utility. This easy to use GUI simplifies many drive management tasks such as secure erasing, temperature monitoring, firmware updates and coalescing settings. The P420m supports three interrupt settings to customize the drive for its environment. There are two "High IOPS" options with medium and aggressive timeouts, and a setting for latency sensitive environments.
The utility also provides device monitoring and allows users to graph the read and write throughput and temperature. There is also access to the SMART data for the drive, which allows users to predict the expected lifetime of the device. Micron also provides a CLI utility, which is handy for those that are using servers without graphical interfaces.
The addition of remote management capability fixes a key weakness of the earlier versions of the RSSDM. Another new feature is the ATA8 ACS2-compliant Sanitize command. While the typical secure erase feature deletes all data, determined individuals can still recover some data. The Sanitize command ensures total data destruction.
Many of the application accelerators on the market do not include such refined management tools, and Microns continued refinement of the user interface provides an advantage over competing solutions.
P420m Architecture and XPERT Suite
The P420m does not employ bridging chips or hardware, providing an efficient contention-free architecture. The architecture combines all 64 placements of NAND into 32 ONFI 2.1-compliant channels, with no intermediary SSD controllers or RAID controllers. Several PCIe solutions feature multiple SSD controllers onboard the PCB, and these in turn feed a RAID controller. Micron simplifies the design, minimizing the amount of hardware, thus maximizing throughput and minimizing latency to the controller.
The Micron-developed ASIC controller provides an embedded ATA host bus adapter, a host/flash translation later, flash maintenance, channel control, and a NAND RAID (RAIN) protection scheme. This streamlined architecture also dispels clunky non-native interfaces, which tend to become the slow point in many solutions. Translating protocols from SAS or SATA to PCIe tends to incur latency penalties that are not a factor with the P420m, which enjoys native PCIe technology.
The Micron/IDT 89HF3208 controller is a 1517 pin FCBGA (Flip Chip Ball Grid Array) which handles 32 channels, supporting four-way interleaving up to 128 NAND die. The functions of the SSD are all handled on-die to minimize host overhead. The P320h features the same controller, making this a time-proven ASIC in long-term deployments.
The P420m is the first PCIe SSD to utilize Micron's XPERT suite to enhance the lifespan and data integrity of the drive. XPERT (eXtended Performance and Enhanced Reliability Technology) provides enhanced defect and error management technology.
This approach utilizes a combination of hardware based error correction algorithms, along with firmware-based static and dynamic wear-leveling algorithms.
Micron has taken error correction and avoidance to the next level with RAIN (Redundant Array of Independent NAND), which calculates and stores parity. This is in essence a RAID 5 implementation at the device level, storing one page of parity per seven pages of data, providing the ability to recover data in the event of an error or failure. RAIN provides data security beyond the standard ECC approach and recovers lost data beyond page, block and die-level failures.
This transparent process takes place without any degradation of the SSDs performance, but does come at the expense of capacity. This implementation relies upon extra NAND to store the data, but Micron has compensated for this with 28% overprovisioning on the P420m.
Data Path Protection
Data Path Protection ensures data integrity during transfer through the drive interface, DRAM, error checkers, data concatenations, and the retirement of NAND. This is accomplished through CRC and ECC algorithms before and after each element in the data path.
The cover plate with the Micron logo and branding is not user-removable. The cover is applied with adhesive backing that will likely strip the 48 Tantalum capacitors from the PCB upon removal. The center of the faceplate, which covers the heatsink, also has a thermal pad for conducting heat to the cover of the P420m, in essence creating a large metal heatsink with the cover.
The Tantalum capacitors provide enough capacitance to flush data in-flight down to the NAND in the event of a power loss. We note the spare pads for the possible inclusion of more capacitors, perhaps signaling a higher capacity version of the card in the future.
Looking at the P420m from the edge reveals a triple-stacked PCB architecture. The heat sink isn't exposed to an abundance of air down among the PCBs, perhaps explaining the thermal pad scheme on the cover. We closely monitored the temperature under heavy loads and the P420m remained within expectations.
Ten very thin copper strands connect the top PCB with its capacitors to the bottom PCB, which holds the controller.
Upon removal, we flip over the top PCB, which holds the capacitors, and not the traces that route the capacitors to the 10-pin connector.
Both pictures above detail the middle PCB, which holds the lion's share of the NAND. The 25nm Micron MLC NAND packages total 36 for this component. Each package contains 32GB of NAND, with eight die per BGA package. The 700GB P420m has the same number of overall NAND packages, but contains only four die per package.
The bottom PCB contains the Micron/IDT ASIC, a 32-channel beast that provides native PCIe to the 2.0 x8 connection. The remainders of the NAND packages reside on this PCB, along with several DRAM packages hidden under the secondary heat sink.
Finally, the bottom of the main PCB holds the remainder of the ECC DRAM, for a total of 2.25GB. The use of ECC RAM allows for the detection and correction of any areas in the cache. We also note the remaining NAND packages crammed into the scant space available on the PCB.
Test System and Methodology
We utilize a new approach to HDD and SSD storage testing for our Enterprise Test Bench, designed specifically to target long-term performance with a high level of granularity.
Many testing methods record peak and average measurements during the test period. These average values give a basic understanding of performance, but fall short in providing the clearest view possible of I/O QoS (Quality of Service).
'Average' results do little to indicate the performance variability experienced during actual deployment. The degree of variability is especially pertinent, as many applications can hang or lag as they wait for I/O requests to complete. This testing methodology illustrates performance variability, and includes average measurements, during the measurement window.
While under load, all storage solutions deliver variable levels of performance. While this fluctuation is normal, the degree of variability is what separates enterprise storage solutions from typical client-side hardware. Providing ongoing measurements from our workloads with one-second reporting intervals illustrates product differentiation in relation to I/O QOS. Scatter charts give readers a basic understanding of I/O latency distribution without directly observing numerous graphs.
Consistent latency is the goal of every storage solution, and measurements such as Maximum Latency only illuminate the single longest I/O received during testing. This can be misleading, as a single 'outlying I/O' can skew the view of an otherwise superb solution. Standard Deviation measurements consider latency distribution, but do not always effectively illustrate I/O distribution with enough granularity to provide a clear picture of system performance. We also use latency plots to illustrate latency scaling under various workloads.
Our testing regimen follows SNIA principles to ensure consistent, repeatable testing. We attain steady state through a process that brings the device within a performance level that does not range more than 20% during the measurement window. Forcing the device to perform a read-write-modify procedure for new I/O triggers all garbage collection and housekeeping algorithms, highlighting the real performance of the solution.
Our test pool features SSDs of varying capacity and it is important to bear this in mind when viewing results. The first page of results will provide the 'key' to understanding and interpreting our new test methodology.
4K Random Read/Write
We precondition the P420m for 9,000 seconds, receiving reports on workload performance every second. We plot this data to illustrate the drives' descent into steady state.
The dots signify IOPS performance every second during the test. The line through the data scatter represents the average performance during the test. This type of testing presents standard deviation and maximum/minimum I/O in a visual manner. High-granularity testing can give our readers a good feel for the latency distribution by viewing IOPS at one-second intervals. This should be in mind when viewing our test results below. We provide latency charts for further granularity below.
This downward slope of performance happens very few times in the lifetime of the device (by some estimates only .04% of the SSD's life). This is typically during the first few hours of use, and we present the precondition results only to confirm steady state convergence.
Each QD for every parameter tested includes 300 data points (five minutes of one second reports) to illustrate the degree of performance variability. The line for each QD represents the average speed reported during the five-minute interval.
4K random speed measurements are an important metric when comparing drive performance, as the hardest type of file access for any storage solution to master is small-file random. One of the most sought-after performance specifications, 4K random performance is a heavily marketed figure.
The P420m comes out of the corner swinging, beating the SLC-powered P320h from QD32 to 256. The P420m averages a massive 734,219 IOPS at QD256. The P320h does not disappoint either, with an average of 711,962 IOPS.
Our read latency chart illustrates the minimal increase in latency as we reach the higher queue depths with both drives. The P420m provides lower latency in read access, while the P320h experiences some turbulence at QD16. The excellent performance at higher QD comes from the parallel architecture of both drives.
Garbage collection routines are more pronounced in heavy write workloads. This leads to more variability in performance.
The P320h leverages its SLC to provide a much higher score than the P420m, though it does experience some minor variability at either end of the spectrum. The P320h averages 206,543 IOPS at QD256, while the P420m lags behind with an average of 113,749 IOPS. This is well above the rated 95,000 IOPS in steady state.
Both SSDs provide remarkably consistent performance in heavy write workloads.
The write latency shows the P320h enjoying a significant latency advantage in this random write workload, primarily due to its SLC NAND.
8K Random Read/Write
8K random read and write speed is a metric that is not tested for consumer use, but for enterprise environments this is an important aspect of performance. With several different workloads relying heavily upon 8K performance, we include this as a standard with each evaluation. Many of our Server Emulations below will also test 8K performance with various mixed read/write workloads.
The average 8K random read speed of the P420m weighs in at 404,519 IOPS at QD256, besting the P320h with its average of 374,638 IOPS.
The 8k read latency results again reflect the same hiccup we noticed with QD8 for the P320h, but the remainder of the results are within expectations, with the P420m providing a better latency range under random read workloads.
The P320h averages 103,204 IOPS, with the P420m trailing behind at an average of 56,070 IOPS.
128K Sequential Read/Write
The 128K read sequential speeds reflects the maximum sequential throughput of the SSD using a realistic file size encountered in an enterprise scenario.
The P420m wears the read speed crown once again, pushing out a tremendous 3,336 MB/s at QD256. The P320h is no slouch, falling only slightly behind with an average of 3,064 MB/s.
The P420m averages 10ms at QD256, with the P320h trailing slightly with an average of 10.9ms.
The P320h provides a much higher average write speed of 1,891 MB/s at QD256, while the P420m delivers 614 MB/s. Sequential write environments are obviously not the target market for the P420m.
Database/OLTP and Webserver
This test emulates Database and On-Line Transaction Processing (OLTP) workloads. OLTP is in essence the processing of transactions such as credit cards and high frequency trading in the financial sector. Enterprise SSDs are uniquely well suited for the financial sector with their low latency and high random workload performance. Databases are the bread and butter of many enterprise deployments. These are demanding 8K random workloads with a 66% read and 33% write distribution that can bring even the highest performing solutions down to earth.
The P420m averages a respectable 112,930 IOPS at QD256, relying upon its superb read performance to outweigh the slower write performance. In mixed read/write scenarios, the P420m can provide excellent performance due to the fast write speed. The P320h leverages it SLC-powered random write speed to provide an amazing average of 228,626 IOPS at QD256.
The P320h utilizes its ultra-low latency SLC to best the P420m easily in the latency test.
The Webserver profile is a read-only test with a wide range of file sizes. Web servers are responsible for generating content for users to view over the internet, much like the very page you are reading. The speed of the underlying storage system has a massive impact on the speed and responsiveness of the server that is hosting the website, and thus the end-user experience.
The P420m shows it strengths in heavy read workloads, beating the P320h handily with an average of 166,847 IOPS at QD256.
The P420m squeaks ahead of the P320h at higher queue depths, while the P320h eeks out a win at the lower queue depths.
The Micron P420m will be available for roughly half the price of the P320h, but packs quite the punch for the lower price point. The addition of several new features, such as remote management, Tantalum capacitors, and the Sanitize command are focused at improving data integrity. Micron also incorporates device-level redundancy with the RAIN technology, and several key aspects of the XPERT suite combine to offer higher endurance, increased error correction capability, and full data path protection.
In read-centric applications, the P420m can provide performance that actually rises above the powerful SLC-based P320h. In both random and sequential read workloads, the MLC-based P420m provided the ultimate in throughput. One of the use cases for the P420m is in webserver applications. In our webserver tests, a read-only test with a mix of different file sizes, the P420m delivered an impressive 2,530 MB/s of random read performance.
The P420m is also well suited for caching applications, where the caching of hot data can provide explosive acceleration to existing infrastructure. Applications with static data sets, or a low write frequency, will benefit from the read speed of the P420m. Media streaming, especially video-on-demand, will also be well served by the P420m. Another important facet of the P420m is the enhanced density provided with its whopping 1.4TB of available space. For dense applications, such as blade servers, the small form factor and resilience to heat provide an attractive solution to performance challenges.
The P420m sacrifices some of the stellar write performance of the P320h with the move to MLC NAND. This is part of the trade-off for a palatable price point, but does not preclude the use of the P420m in the majority of applications. The P420m performed well in our OLTP/Database testing, which features a mixed read/write workload that falls within the endurance specifications of the drive.
The RealSSD manager software is a bonus for users that simplifies management in an easy to use GUI. The majority of PCIe SSDs are still stuck with rudimentary command line interfaces for management, and while the slick GUI is a nice addition, Micron also provides command line functionality to cover all bases.
The P420m is not as expensive as other PCIe solutions, but still provides class-leading enterprise features. One admirable trait of Micron SSDs is the continued focus on data integrity, the continued inclusion of RAIN and the addition of the XPERT suite and tantalum capacitors continues the Micron tradition of ensuring data integrity, above all else.
Sometimes the most important aspect of a storage device is what it is not. The P420m is not as expensive as other PCIe alternatives and is tremendously more economical than HDDs. The explosive performance of the P420m and the lower price point combine to provide an easily accessible product for the mainstream market.