The Micron RealSSD P320h HHHL Enterprise SSD is revolutionary in many aspects, both for Micron and for the broader world of enterprise SSD storage as well. The IMFT (Intel-Micron Flash Technologies) team has recently topped 20% of the NAND market as they continue a period of fast growth. Micron has leveraged this in-house NAND and DRAM production capability to take on the 2.5" SSD form-factor enterprise and consumer market by storm. Micron has enjoyed tremendous success in both of these areas and the development of the P320h is a natural progression for Micron as they delve further into the enterprise realm.
The Micron P320h provides a whopping 50 Petabytes of write endurance from its 34nm server grade NAND on the 700GB model (up to 28TB written daily), and 25 Petabytes of endurance on the 350GB version. The high endurance SLC NAND paired with Micron DDR3 DRAM and a custom designed Micron/IDT ASIC combine to provide sustained performance that is unmatched from any single device on a PCIe slot.
The offloaded architecture provided by the custom designed Micron/IDT ASIC minimizes the intrusion upon precious host system resources, while providing unmatched performance. Many vendors in this space currently handle all flash management responsibilities on the host system, robbing applications of valuable CPU cycles and RAM capacity.
The P320h offers a blazing 785,000 IOPS of read speed, but what is truly amazing about this metric is the simple fact that is saturating the PCIe 2.0 slot with random read activity. When converted from IOPS to gigabytes per second this is 3.2GB/s, the maximum sustainable by the PCIe 2.0 interface. The maximum sequential read speed of the P230h is also 3.2GB/s.
Though designed for heavy read environments, the P320h still sports an impressive 205,000 IOPS in random write, and 1.9GB/s in sequential write performance. The disproportionate read to write speed ratio is actually well suited for typical enterprise workloads, which consist of 60- 80% read activity. Striking this balance between the read and write speeds allows the P320h to be tuned and perfectly tailored for its intended environments. PCIe SSDs excel in many types of workloads, and the P320h is designed specifically for these environments. The P320h is geared for high read workload environments that demand frequent read intensive access, such as cloud computing, high-performance computing, analytics, media streaming, web acceleration, video on demand and data warehousing.
There is also the very tantalizing possibility of utilizing the unique blend of read and write performance with third-party caching software solutions. Utilizing the massive speed and impressive endurance of the P320h as a front-end for caching of underlying DAS and SAN arrays can provide an exponential return on investment for customers by increasing the performance of existing infrastructure. Maximizing the storage potential brings the added bonus of increasing the CPU utilization, especially in virtualized environments. These exponential increases in ROI gained by removing the storage solution as the bottleneck can create a huge return on the initial capital expenditure.
The P320h features a very low overall power threshold, only consuming 25 watts. This allows the SSD to function from the power supplied via the PCIe slot without auxiliary power as demanded by some competitors. Keeping the power requirements in check results in lower costs in the long term for both power consumption and heat generation. The P320h supplies an astounding 30,000 random read IOPS per watt, a measurement that no other PCIe SSD that we have tested can match.
Finally, one of the greatest aspects of the Micron architecture is vertical integration. The NAND and DRAM are fabricated in-house and the custom ASIC is purpose built for Micron's use. This provides a steady platform that Micron controls in its entirety allowing for much more refined component integration and streamlined custom support in the event of an issue. All of these features combine to create the best-in-class price vs. performance ratio that the P320h provides to its customers. Today we will take a much deeper look at the architecture and performance of the Micron P320h.
Product Positioning and Data Protection
The PCIe SSD category is growing rapidly as NAND continues its rapid explosion into the datacenter. The P320h represents a huge step forward for Micron as they continue their push into the datacenter with their enterprise storage flash products. Micron entering into the PCIe SSD market is certainly turning heads of several of the 'old guards' in this space.
Micron's entrance into this market is big news simply because Micron has their own NAND fabrication capabilities, giving them a tremendous advantage over the smaller players in this niche. Owning their own fab allows them to maintain higher profit margins while undercutting competitors on price. The P320h serves a market previously dominated by players such as Fusion-IO, OCZ, SanDisk, Texas Memory Systems, LSI and STEC.
Other companies with fabrication abilities have recently made inroads into this market as well, with the Intel 910 and SanDisk Lightning both hailing from companies with NAND foundries. The Micron P320h features a different architecture and feature set that differentiates them from these other foundry competitors, but Micron is looking to further means of product differentiation.
The P320h provides a springboard for Micron to expand even further into the NAND acceleration market by implementing a shared flash system. Micron's acquisition of Virtensys' assets in January supplies the IP to pursue development of a PCIe virtualization technology platform. This virtualization platform will enable the sharing of a pool of PCIe SSDs between multiple servers. This enables the performance and latency of a PCIe SSD with the flexibility of a shared storage device. Sharing storage across multiple servers enables high-performance computing clusters to leverage PCIe bandwidth and latency.
There are versions of this type of technology available or in development from several of Micron's key competitors. EMC has the Thunder initiative underway, Violin Memory touts their networked flash arrays and Fusion-IO delivers their ION data acceleration technology. Even OCZ has made inroads into this promising market with their acquisition of SANRAD. Expect Micron to invest heavily in this area as they look to provide their customers with a PCI sharing technology that boasts much higher speed and lower latency performance than competitors can offer.
RAIN (Redundant Array of Independent NAND)
Micron leverages several advanced approaches for verifying user data and correcting errors. Typical background routines such as hardware based ECC algorithms and firmware-based static and dynamic wear-leveling algorithms run on the Micron ASIC. Micron has taken error correction and avoidance to the next level with their proprietary RAIN Technology (Redundant Array of Independent NAND) implementation. At its most basic level, this is very similar to RAID 5 functionality.
RAIN Technology groups and logically stripes page and/or block data across NAND channels, then generates and stores parity data along with the user data. RAIN creates one page of parity for every seven pages of user data. This data plus parity structure provides data recovery due to a storage element failure. This corrects errors due to a channel, page or block failure. The transparent process takes place without any degradation of the SSDs performance.
There is a capacity trade-off for the enhanced data protection scheme (in graphic above). This is a worthwhile investment to protect user data, especially in mission-critical applications.
Architecture and Management
One of the keys to Micron's success with the P320h is the lack of bridging chips or hardware. The architecture combines all 64 placements of NAND into 32 channels, with no intermediary SSD controllers or RAID controllers. Several types of solutions on the market today have SSD controllers onboard the PCB and these in turn feed the RAID controller. Micron simplifies the design, minimizing the amount of hardware and maximizing latency and throughput performance to the controller.
The Micron-developed ASIC (Application Specific Integrated Circuit) controller provides an embedded ATA host bus adapter, a host/flash translation later, flash maintenance, channel control and a NAND RAID (RAIN) protection scheme. This streamlined architecture also dispels clunky non-native interfaces, which tend to become the slow point in many solutions. Translating protocols from SAS or SATA to PCIe tends to incur latency penalties that are not a factor with the P320h, which enjoys native PCIe technology.
The IDT 89HF3208 controller is a 1517 pin FCBGA (Flip Chip Ball Grid Array) which handles 32 channels, supporting four-way interleaving up to 128 NAND dice. The functions of the SSD are all handled on-die to minimize host overhead.
The P320h features three interrupt coalescing settings to allow users to tailor the SSD to their workloads. This allows issuance of more than one I/O during each interrupt to boost performance in heavy workloads. This feature leads to some interesting results later in the review when the drive is under very heavy load. An additional coalescence setting provides more optimized performance for a low QD environment.
The RealSSD P320h is managed through the RealSSD Manager utility. This easy to use GUI makes management of the drive very easy. The RSSDM simplifies many processes such as secure erasing, temperature monitoring, firmware updates and coalescing settings. The utility also provides device monitoring and allows users to graph the read and write throughput and temperature. There is also access to the SMART data for the drive, which allows users to predict the expected lifetime of the device. Micron also provides a CLI utility, which is handy for those that are using servers without graphical interfaces.
Many of the competing application accelerators on the market do not include such refined management tools, though this one does lack remote management capability it still takes a big lead over many of Micron's competitors.
The P320h comes in a single slot HHHL form factor (Half Height and Half Length). This allows the SSD to fit into slim servers in dense configurations.
Packing all of the NAND and DRAM packages along with the controller into such a slim solution is a daunting task. This is accomplished by utilizing two daughter cards that are connected to the main PCB by two connectors. The large Micron/IDT ASIC controller in the middle of the card is hidden under the large heatsink. This heatsink provides enough cooling to keep the PCIe card cool under the most demanding high-heat scenarios, and requires a standard 300LFM of airflow.
One of the advantages of using SLC flash is its high tolerance for heat. MLC simply cannot withstand the high temperature thresholds that the SLC onboard the P320h can handle easily. This also provides the opportunity for a unique method of packaging the NAND flash packages on the daughter card. The SLC packages are sandwiched very closely to the packages on the main PCB. There is a very small amount of space separating the two PCBs, and MLC would surely overheat in such a densely packaged environment.
SLC can handle the heat, with an easy tolerance of up to 60c, and the card features a throttling mechanism in the event of heat climbing above the maximum threshold. This dynamic throttling can either limit the workload to keep the device cool enough for safe operation, or shut the card down entirely.
The daughter boards are fastened to the main PCB by connectors that allow the cards to be easily detached. We can see the banks of Micron 34nm SLC ONFI 2.1-compliant NAND. There are a total of 64 packages on the drive, for a total of 1TB of raw NAND. There are four packages per die on the 350GB drive, and double the density for the 700GB drive, with eight packages per die. Five of the banks of Micron DRAM are to the left of the large Micron/IDT controller, and four more DRAM packages are contained on the rear of the drive. This leaves a total of 2.25GB of DRAM available to the device.
The two daughter cards hold four packages per side, for a total of eight per card.
Here we can observe the connector that mates the cards to the main PCB. These daughter cards are secured to the main PCB with thick fasteners that also have Loctite on the threads to function well in high vibration environments. The rear of the card holds the remaining NAND packages and the remainder of the DRAM resides under the left edge of the sticker.
Test System and Methodology
We utilize a new approach to HDD and SSD storage testing here at TweakTown for our Enterprise Test Bench. Designed specifically to target the long-term performance of solid state with a high level of granularity, our new SSD testing regimen is applicable to a wide variety of flash devices. From typical form-factor SSDs to the hottest PCIe application accelerators available, we are utilizing this new test regimen to provide accurate performance measurements over a variety of parameters.
Many forms of testing involve utilizing peak and average measurements over a given time period. While these average values can give a basic understanding of the performance of the storage solution, they fall short in providing the clearest view possible of the QOS (Quality Of Service) of the I/O.
The problem with average results is that they do little to indicate the variability experienced during the actual deployment of the device. The degree of variability is especially pertinent, as many applications can hang or lag as they wait for one I/O to complete. This type of testing illustrates the performance variability expected in these types of scenarios while also including a whole host of other relevant data, including the average measurements during the measurement window.
While under load, all storage solutions deliver variable levels of performance that are subject to constant change. While this fluctuation is normal, the degree of fluctuation is what separates enterprise storage solutions from typical client-side hardware. By providing ongoing measurements from our workloads with one-second reporting intervals, we can illustrate the difference between different products in relation to the purity of the QOS while the device is under load. By utilizing scatter charts readers can get a basic understanding of the latency distribution of the I/O stream without directly observing numerous graphs.
Consistent latency is the goal of every storage solution, and measurements such as Maximum Latency only illuminate the single longest I/O received during testing. This can be misleading, as a single 'outlying I/O' can skew the view of an otherwise superb solution. Standard Deviation measurements take the average distribution of the I/O into consideration, but do not always effectively illustrate the entire I/O distribution with enough granularities to provide a clear picture of system performance.
Our testing regimen follows SNIA principles to ensure consistent, repeatable testing. Due to the very nature of NAND devices, it is important that we test under steady state conditions. We attain steady state convergence through a process that brings the device within a performance level that does not range more than 20% from the average speed measured during the measurement window.
We only test below QD32 to illustrate the scaling of the device. However, low QD testing with enterprise-class storage solutions is a frivolous activity if not presented with higher QD results as well. Administrators that have optimized their infrastructure correctly sustain high QD levels, capitalizing on the performance of the premium tier of storage that SSDs provide. With the explosion of virtualization into the datacenter, the high QD performance of the storage solution is the most important metric.
The first page of results will provide the 'key' to understanding and interpreting our new test methodology.
4K Random Read/Write
We precondition the Micron P320h with a heavy 4K random write workload for 18,000 seconds, or five hours. Every second we are receiving reports on several parameters of the workload performance. We then plot this data to illustrate the drives' descent into steady state. Each round is ten minutes in duration.
This preconditioning slope of performance happens very few times in the lifetime of the device, and we present these test results only to confirm the attainment of steady state convergence.
Each QD for each parameter tested includes many data points to illustrate the degree of performance variability. Even though there is a limit of 256QD per volume imposed by the Windows Server operating system we tested up to a QD of 512 to measure any differences gained by the coalescing settings.
4K random speed measurements are an important metric when comparing drive performance, as the hardest type of file access for any storage solution to master is small-file random. One of the most sought-after performance specifications, 4K random performance is a heavily marketed figure.
The P320h tops out at 785,000 IOPS in our testing. There is a clear distribution of the I/O in the higher QD between 600,000 and 785,000 IOPS. This may indicate some type of internal QOS functionality inside the Micron/IDT controller, or result from the command coalescence that allows more I/O to be forced through per interrupt.
The 4k random write speeds top out at 225,000 IOPS. There is very little variability in the write speeds offered by the P320h. The write performance overall is very consistent.
8K Random Read/Write
8K random read and write speed is a metric that is not tested for consumer use, but for enterprise environments this is an important aspect of performance. With several different workloads relying heavily upon 8K performance, we include this as a standard with each evaluation. Many of our following Server Emulations will also test 8K performance with various mixed read/write workloads.
The P320h serves up 375,000 IOPS in pure 8K random read speed, with very little variability with the slightly larger file size. We also do not note the spread in read speed as we did with the 4K random read testing.
The 8K random write speed tops out at 44,000 IOPS at QD256, but raises up above that to an average of 47,000 IOPS, and even higher, at QD512. This is probably due to the more efficient coalescence settings for the P320h.
128K Sequential Read/Write
The 128K sequential read speeds reflect the maximum sequential throughput of the SSD using a realistic file size actually encountered in an enterprise scenario.
The P320h levels off at high QD with a speed of 23,000 IOPS, or 2.8GB/s. This is a tremendous showing in the sequential testing, providing more than enough sequential throughput for many demanding tasks, such as video on demand applications.
The 128K Sequential write speed tops out at 16,000 IOPS, or roughly 2GB/s. The sequential write performance is very consistent through the entire QD range.
OLTP and Webserver
This test emulates Database and On-Line Transaction Processing (OLTP) workloads. OLTP is in essence the processing of transactions such as credit cards used heavily in the financial sector. Enterprise SSDs are uniquely well suited for the financial sector with low latency and high random workload performance. Databases are the bread and butter of many enterprise deployments. These are demanding workloads with 8K random of 66% read and 33% write distribution that can bring even the highest performing solutions down to earth.
The P320h performs very well in this heavy workload scenario, with an output of 117,000 IOPS at QD256, and rising up to as high as 121,000 IOPS at a QD512. This excellent high QD scaling further highlights the excellent performance advantage of the P320h in heavy workloads.
The Webserver profile is a read-only test with a wide range of file sizes. Web servers are responsible for generating content for users to view over the internet, much like the very page you are reading. The speed of the underlying storage system has a massive impact on the speed and responsiveness of the server that is hosting the websites and thus the end user experience.
The read-only nature of our Webserver testing highlights the tremendous read speed of the P320h, with an average of 200,000 IOPS delivered from QD64 to QD256. We also note the slight bump in speed at the much higher QD512.
Fileserver and Emailserver
The File Server profile represents typical workloads encountered in file servers. This profile tests across a wide variety of different file sizes simultaneously, with an 80% read and 20% write distribution.
The P320h provides 130,000 IOPS at QD256, and up to 136,000 average IOPS at QD512, though we can see it peaking at 145,000 IOPS overall. The P320h positively flies at this heavy read and light write distribution workload.
The Emailserver profile is very demanding 8K test with a 50% read and 50% write distribution. This application is indicative of the performance of the solution in heavy write workloads.
Though many would not think that the P320h would thrive in such a heavy write environment, they would be surprised by the incredible speeds offered up by the P320h in this scenario. At QD256 it provides 82,000 average IOPS, but again the interrupt coalescence provides a huge boost at QD512, up to an average of 88,000 IOPS and a peak of 90,000 IOPS.
The P320h signifies Micron's entry into the PCIe application accelerator market, and they certainly know how to make a big entrance. The Micron P320h is a dominating high-end SLC product for the most demanding customers at a time when many of Micron's competitors have decided to forgo SLC in favor of cheaper MLC. The SLC NAND on the P320h is uniquely well suited for the enterprise space. Its unbelievable 50PB endurance, resistance to high heat and improved write latency are sought after attributes in the enterprise space.
The custom designed and integrated Micron/IDT controller is a great building block for Micron, allowing them to design a number of devices with the same controller in the future. There is an MLC variant of the P320h already in the works.
The P320h's affinity for heavy workloads is also a compelling reason for users to adopt this platform. With the interrupt coalescence settings, we observed superb performance at higher-than-normal queue depths. This is a very valuable characteristic of this SSD, since the price premium of any SSD storage device demands its full utilization. The P320h can handle heavy loads, and come back for more.
For many administrators one of the key inhibitors to the adoption of flash media is simply that it will wear out quickly. The mind bending 50PB of endurance provided by the Micron P320h allows for an excellent return on the upfront investment required. By delivering extreme endurance, and a means of measuring that endurance and providing a predictable life span through SMART data monitoring, Micron provides customers with a reliable and predictable solution.
The management utilities, and in particular the GUI, are refreshingly easy for users to utilize. Many of the management utilities with competing devices pale in comparison to the Real SSD Manager offered by Micron. There is also the command line utility, which provides additional functionality for those using bare bones server installs. The only area where Micron can improve the management utility is by providing a means of remote management.
The low power draw is, in particular, a great selling point for the P320h. Offering up the industry's highest read IOPS per watt is no small feat. This miserly power consumption of 25w will keep the long-term costs of ownership low. Many forget that the price of powering a storage device over its lifetime can add up to more than the initial investment. Micron keeps this in mind by providing great power consumption figures in conjunction with the high performance. The lack of a requirement for external power is also a great feature that will not necessitate the need for additional cabling.
The P320h does not use host resources to perform any of its drive management routines, alleviating the burden on the host system. This will lend itself very well to multi-card installations and keeps the entire system as efficient as possible, with CPU cycles dedicated to the applications instead of storage infrastructure.
The scaling when under extreme loading is simply a tremendous value-added feature. This resilience and the increased performance under heavy loads will allow users to optimize their systems to truly take advantage of this application accelerator.
The Micron P320h is overall a well-rounded and well thought out device. Its performance is simply unmatched by any of their competitors at this point, and the key sales wins that Micron has enjoyed lately illustrate this point very well. With future versions of the Micron P230h already in the works, expect this robust platform to continue to develop and evolve. With its superb performance and unbelievable endurance, Micron continues their assault on the enterprise SSD space, and we give the P320h the TweakTown Editor's Choice Award.