The long awaited PCIe 3.0 bus has finally arrived and it is more than a bit fashionably late for the enterprise arena. The rush is on for companies such as LSI to deliver a host of storage components that will allow customers to reap the generational benefits that PCIe 3.0 brings along with it.
PCIe 3.0 brings much needed speed enhancements in the enterprise space for a whole host of technologies. The original introduction of PCIe 1.0a was in 2003, with a newer version of 1.0 released in 2005, followed closely by the release of version 2.0 in 2007. In 2010 work began on the new 3.0 version. Released late in 2011, PCIe 3.0 finally appeared ready to rear its head. Unfortunately, 3.0 was just released into the wild by Intel in March of this year. This has led to the delay of performance increases in a number of enterprise technologies. 40 and 100GB Ethernet, 16Gb Fibre Channel, FDR InfiniBand and 12Gb SAS all will rely heavily upon the increased bandwidth that PCIe 3.0 brings along with it.
In fact, 6Gb/s SAS is already hamstrung by PCIe 2.0. At full speed, it can only support five SAS lanes with PCIe 2.0. This has led to the under-utilization of some of the current crop of storage processors due to the restrictions placed upon them by PCIe 2.0. Even 10Gb/s Ethernet is experiencing limitations, with four ports unable to operate at full speed on a PCIe 2.0 bus.
With the massive improvements in performance in both IOPS and bandwidth, the penetration of solid state storage into the datacenter brings the challenge for companies to bring out technologies that can deliver. Just a few short years ago there were very few Host Bus Adapters (HBA) and RAID controllers that could shoulder the load of solid state storage (SSS). Typically, these devices were just hampering the full performance of the attached SSS. HBAs and RAID controllers that can scale up and capitalize on these improvements are in desperate demand.
LSI has led the charge with the wildly successful 92xx series of HBAs and controllers that have unlocked the performance of 6Gb/s SAS, revolutionizing the ability of the server to cope with the ever-increasing demand being placed upon the data storage subsystem.
In the last 25 years CPU performance has increased 2,000,000 times, yet the speed of the humble disk drive has only increased 11 times over. It isn't hard to see the large disparity that has led to the storage subsystem becoming the bottleneck. PCIe 3.0 provides a means of increasing the aggregated performance of many HDDs and SSDs in an efficient manner, helping to alleviate the bottleneck.
LSI brings forth their new LSI HBA models as the introduction to their newest line of PCIe 3.0 capable storage controllers. The 9207-8i Mustang features the LSISAS2308 6Gb/s SAS IO controller with a dual core 800MHz PowerPC processor. With eight lanes of PCIe 3.0 at full speed (8GB/s), this new series of HBAs can deliver on the performance promises of PCIe 3.0.
The 9207-8i that we will be testing today boasts some impressive specifications, with up to 700,000 IOPS delivered through LSIs' Fusion-MPT (Message Passing Technology) 2.0 architecture and the highest sequential bandwidth that we have observed in our lab with a single HBA.
PCIe 3.0 Makes Its Entrance
It isn't surprising that LSI is the first to market with the new PCIe 3.0 HBAs, as they did lead the charge for 6Gb/s implementation into the server space. This has served to cement LSI as the current enterprise RAID technology leader with a 75% market share in the channel and four out of the five largest OEMs as their customers. LSI has fired the starting gun for PCIe 3.0 connectivity with this new line of HBA products, which will expand to the PCIe 3.0 MegaRAID controllers in Q3 of this year.
The previous generation of LSI hardware was effectively limited to around 2.9GB/s due to the restrictions of PCIe 2.0. The chart above illustrates the impressive bandwidth increases that will be unlocked with the inclusion of PCIe 3.0 technology. A 47% increase in bandwidth is certainly a large jump as we move to the newer interface, but bear in mind this is just with 6Gb/s SAS that is already heavily implemented.
There will be a further jump in speed as LSI transitions to 12GB/s SAS, leaping up above 6 GB/s in bandwidth later in the year. 12Gb/s SAS would be rather pointless on a PCIe 2.0 interface, even the current 6Gb/s specification hasn't been used to its full potential until now.
Just as important as the higher bandwidth is the ability to utilize that throughput effectively. There are limitations to the amount of the overall bandwidth that actually translates to usable performance for the end-user. There is an overhead that is required for the protocol itself to operate. Keeping this overhead low is of importance and LSI has managed to keep the overhead steady at 20%. This efficiency allows the user to reap the most of the available specifications' bandwidth.
The PCIe 3.0 interface delivers a bit rate of eight gigatransfers per second (GT/s), close to twice the bandwidth of PCIe 2.0. There are also technological advances in the specification itself, with transmitter and receiver equalization, PLL improvements and clock data recovery. These new products are aimed at corporate datacenters, cloud storage providers and high performance computing. Providing more performance in a smaller package will allow end users to more efficiently and effectively scale the storage subsystem as their needs grow.
Combining the PCIe 3.0 interface with new architecture that is onboard server motherboards brings the PCIe lanes directly to the Intel Xeon E3 and E5 series chips. This will provide even better performance for future applications, with no Northbridge to incur additional latency.
Specifications, Pricing and Availability
The 9207 is a low profile (2.6" x 6.6") single PCB HBA. It can fit into 1U and 2U server racks easily and utilizes a single binary OS driver that will operate any Fusion MPT controller or adapter. Keeping the footprint in the host system low is always a crucial requirement for any application of hardware into the server. The LSI OEM drivers integrated into virtually every operating system eases the amount of time that is required for system configuration.
Performance is delivered by the LSISAS2308 6Gb/s SAS IO / Fusion MPT 2.0 controller, which is under the black heatsink. To the right of the PCB we can observe the two x4 internal SFF-8087 mini-SAS ports that allow for connection of devices. These ports can be configured in 8ea x1 ports (for individual drives) or 2ea x4 ports for dual port enabled devices.
The 9207-8i can manage up to 256 Non-RAID SAS or SATA 1, 3 and 6Gb/s storage devices. This includes HDDs, SSDs and tape devices.
The bottom edge of the PCB has the PCIe connector that enables eight lanes of PCI Express 3.0 connectivity. The SAS Bandwidth is half duplex, with 600MB/s per lane. The HBA comes with both full height and low profile brackets. The 9207-8i consumes 9.8W typical and requires a standard 200 LFM (Linear Feet per Minute) airflow. The previous generation 9211-8i by comparison consumes 7W typical, so there is a slightly higher power requirement for the 9207 HBA.
The 92xx series is covered by the standard three year warranty.
There are a few different versions of the 9207, with the "˜I' denoting the adapters with internal ports and the "˜e' versions sporting the external ports. There are also versions that have both internal and external ports.
All 9207 models are shipped with IT firmware, with the option of flashing IR firmware to the adapter if so desired. There will also be versions shipped with IR firmware, which will be denoted with a 1 instead of a 0 in the product SKU. A 9207 will have the IT firmware at shipping, while the 9217 will have the IR firmware. Both models are physically identical, but the 9217 versions are commonly supplied to OEM customers.
IT is simply a pass-through mode for the controller, which minimizes firmware overhead. Many users, especially with SSDs, will simply use the device as pass-through with a number of devices in a non-RAID configuration. As the capacity of the attached storage devices is exploding, long rebuild times are becoming problematic, leading some to forgo RAID in some implementations.
The IR firmware stands for "Integrated Raid'. Integrated RAID is a low cost hardware RAID solution that is enabled by the Fusion-MPT architecture. RAID 0 (Striping), 1 (Mirroring), 1E (Mirroring Enhanced with an odd number of devices) and 10 (Mirroring and Striping) are available.
The ability to use the adapter in either IT or IR mode allows users flexibility to decide which version best suits their needs.
Test System and Methodology
We will be testing the 9211-8i head to head with the 9207-8i to quantify the increase in performance with the next generation LSI HBAs. There is the possibility of using the IR firmware, Windows RAID or third party program to aggregate the performance of all of the drives into one large RAID 0 volume. There is also the option for RAID 1, 1E and 10 with the IR firmware.
The configuration that we have tested with provides the best latency results and does a good job of showing the base performance of the 9207-8i Mustang. This is simply configuring the SSDs as separate volumes and accessing each individually. This provides low overall latency in conjunction with much lower maximum latencies than RAID 0. This portion of testing also adheres to our SNIA Specification based testing regimen. More information on our enterprise testing regimen can be found here.
The fun of testing HBA and RAID controllers is that we can use the fastest SSDs that are at our disposal in Fresh out of Box (F.O.B.) conditions. One of the more tedious aspects of standard enterprise testing is the long drawn out conditioning runs to push the SSDs into steady state. This can take up to 12 hours for some SSDs and consists of a lot of waiting on our part. With HBA and RAID controller testing we get to secure erase the drives repetitively to keep them at the absolute top speed. We are trying to measure the speed of the controller and not the attached storage. For a guy who spends 50+ hours a week waiting on steady state conditioning runs to complete, this is liberating!
One of the challenges that we ran into when testing the 9207-8i Mustang was simply being able to saturate the performance of the adapter, so we tested with SSDs to get as close as possible. There currently aren't many SSDs on the market that will be able to saturate the 9207-8i with only eight devices in both throughput and IOPS, so we are forced to take a two-pronged approach.
For throughput testing we turn to our trusty Crucial RealSSD C400 drives. Physically identical to the consumer M4, this 8 x 256GB array of SSDs has served us well in our testing by offering solid and consistent performance. Providing per drive read speeds of 500MB/s and 260MB/s write speeds, we can saturate the read bandwidth of the 9207-8i, but not the write bandwidth.
The second prong of our attack consists of the SanDisk ESS Lightning LS 300S EFDs (Enterprise Flash Devices). The LS 300S are 300GB SLC enterprise class SSDs that come in both a 3.5" and 2.5" form factor. These drives are made for taking a beating and are warrantied for five years of unlimited writing with no throttling. They are simply solid as a rock, offering consistent performance regardless of the workload.
Rated at a blistering 160,000 IOPS per device, they are definitely the heavyweights of enterprise flash storage, even though we are using the 3Gb/s versions. There simply are not many devices that can lay claim to this type of performance, but even with the staggering performance possible, we still cannot saturate the 9207-8i. The 9207-8i Mustang can handle up to 700,000 IOPS, so even with perfect scaling we will be just shy of being able to saturate the adapter. This speaks volumes to the capabilities of the 9207-8i.
Latency and 4K Random
The fact that we cannot entirely saturate the 9207-8i leaves us with few parameters that we can test to compare the generational differences between using the 9207-8i with PCIe 3.0 and the 9211-8i with the PCIe 2.0 interface. Typical tests such as our Iometer server emulations will fall short, only showing the limitations of the SSDs themselves.
This leaves us with testing the latency of the HBAs and the sequential read speed with the Crucial M4s. Then we will move onto testing the maximum IOPS with the SanDisk ESS Lightning's.
The 4K latency test reveals that the latency between the two HBAs is very close, with the values listed above the line being the 9207-8i's latency. There is not any significantly appreciable difference between the two.
The 4K random speed, tested with the Crucial M4s, scales nicely with both controllers, yet falls far short of being able to reach the controllers limits. The 9207-8i has a slightly better latency, which leads to a small difference at the upper end of the chart.
The write results are closely aligned as well, with the 9207-8i pulling slightly ahead. This may be due to the variability of performance that can be associated with SSDs, since both are not near the IOPS ceiling for each respective controller. There simply are not SSDs available on the market currently that can provide the amount of write IOPS sufficient to saturate the 9207-8i with only eight devices.
Sequential Read Speed
The 128K Sequential read speed chart illustrates the difference in throughput that the 9207-8i brings with it. These results are reported in binary values, with the device topping out at 4,040 MiB/s. For those more accustomed to decimal values we have included a few screenshots of the maximum speeds with both controllers.
The 9211-8i tops out with at 3GB/s with all eight M4s reading at a QD of 32 each.
The 9207-8i reaches 4.268 GB/s with all eight SSDs, which is the highest throughput that we have witnessed from any single controller. This is a throughput increase of 30% over the previous generation controllers. This is an impressive massive amount of advancement for one device, which will be multiplied even further when several of these HBAs are deployed into one server.
512B Random Read Testing
The final area that we can compare the tremendous speed of the 9207-8i Mustang against the 9211-8i requires us to use the SanDisk ESS Lightning LS 300S Enterprise Flash Drives in an attempt to saturate the controller with read IOPS.
We configured the four SanDisk SSDs as separate volumes and dialed up the Queue Depth with 512B random 100% read access.
We can observe the 9207-8i topping out much higher than the 9211-8i. The 9207 reaches 633,000 IOPS at top speed compared to the 9211 with 464,000 IOPS. This is an increase of 27%, but actually, the 9207-8i is rated at 700,000 IOPS, which would equate to a 34% increase in IOPS performance.
The SanDisk ESS Lightning's provide us 158,250 IOPS each, which is nearly perfect scaling with their ability to provide up to 160,000 IOPS each.
Here we can see the Lightning's pushing out 632,000 IOPS with an excellent .2021 ms average latency. This low latency is far out of reach with typical SSDs.
The low average latency with very high Queue Depth IOPS requests is what separates the Lightning's from most SSDs. Here we can observe an average latency comparison between the two HBAs, with the 9207-8i enjoying a much lower overall latency.
One aspect of performance that the 9207-8i appears to handle very well is maximum latency. While conducting our 512b random testing we noticed that the maximum latency we were receiving was much higher with the 9211 compared to the 9207.
We conducted the tests several times to assure that the results are accurate and reproducible. While this may be specific to our configuration, we feel that this would be an expected result with the much more efficient PCIe 3.0 specification. The possibility of receiving much better latency alone with existing devices will be enough for many users to make the jump.
Thermal Monitoring and Watts to IOPS
One of the most overlooked areas in many enterprise evaluations of storage solutions is the power consumption and the amount of heat the unit generates. Heat generation has to be mitigated via a range of different types of active cooling methods. This constant need to dissipate heat away from the datacenter results in one of the highest ongoing expenses in these environments. Active cooling requires power and lots of it.
For every watt of power consumed in a datacenter there also has to be a redundancy for that power as well. This will provide the datacenter the ability to continue operating during power 'events'. This usually consists of large banks of batteries and generators that can be a very expensive proposition. By limiting the amount of heat introduced into the datacenter the power used for climate control and the redundancy costs of that power are lowered as well.
Power consumption, by both the device itself and the power needed to deal with any heat generation sometimes costs more than the purchase of the unit itself over the devices lifespan. Power and heat generation are significant measurements to take into consideration when making purchasing decisions.
The workload testing for heat generation was conducted at a QD of 128. We tested 128K sequential read and write, along with 512B with the SanDisk ESS Lightning LS 300S. The results displayed are T-Delta to Ambient. This allows for a higher level of accuracy as it accounts for any small variations in room temperature. We did test without any airflow over the HBAs as we are using an open-air bench. This is not wise for normal users, who should adhere to the 200 LFM requirements.
The 9207-8i does generate a few more degrees in temperature across the tested workloads. This is directly in line with the fact that it consumes 9.8W compared to the 7W that the 9211-8i consumes. The few degrees of extra temperature are within an acceptable range considering the massive IOPS and throughput performance improvements.
Watts to IOPS
Overall keeping a low power threshold is the holy grail of high performance enterprise storage. IOPS/Watts is a calculation used to determine the amount of IOPS that are given per Watt of power consumed. Typically, for every watt of power consumed there is also an accompanying increase in heat generated by the device. This creates a vicious cycle of overall power consumption as the additional heat generated must also be cooled.
Measuring the IOPS/Watts for the random performance between the two HBAs keeps things in perspective. Even though the 9207-8i brings massive improvements in IOPS performance, many would consider an increase in the power per watt consumption ratio to be unacceptable.
The 9207-8i actually improves in this measurement, with 71,428 IOPS per watt, taking into consideration that the HBA can reach the rated 700K IOPS. This is more efficient than the 66,386 watts per IOP that the 9211-8i posted in our testing.
The 9207-8i also posts moderate gains in the throughput IOPS to Watts measurement. The 9207-8i gives 3323 IOPS per Watt in our testing, compared to the 9211-8i with 3278 IOPS per Watt.
Overall, the 9207-8i puts forth excellent results with the design of the newer version being more efficient in our measurements.
The pace of the data explosion is quickening with more and more users everyday adding to the problem. Cloud computing, social networks and the ever-expanding amount of mobile devices are creating the need for an exponential increase in not only computing power, but also storage capacity and speed.
Getting more 'horsepower' from each level of the storage subsystem is becoming necessary to keep up with the increasing loads. The advent of PCIe 3.0 thrusts the datacenter into the next generation and the new line of HBAs from LSI are poised to deliver. PCIe 3.0 not only brings throughput and IOPS gains, but also technological advances in the specification itself, with transmitter and receiver equalization, PLL improvements and clock data recovery. Gaining higher IOPS and more throughput from existing equipment is certainly an exciting prospect for users facing performance challenges.
The disruptive flash segment is growing daily and advances are needed such as PCIe 3.0 to keep up with the pace of SSD development. We can safely declare that the 9207-8i delivers IOPS potential that will be hard to beat. Even with some of the highest performing SSDs on the market, we could not fully press the 9207-8i to its limits. There is plenty of headroom that will keep the 9207 ahead of the game for at least a short time.
The key to any new implementation is simplicity and a wide range of compatibility. LSI delivers this easily with integrated drivers inside every major operating system, with a single binary OS driver that operates any Fusion-MPT controller or adaptor.
Delivering the performance efficiently is a requirement as datacenters are becoming more focused on lower power consumption. The 9207-8i Mustang offers the highest throughput of any HBA that is on the market and does so at an I/O per watt ratio that is also among the best available.
Another important consideration is pricing. The 9207-8i that we tested today has an MSRP of $305, which is a great deal considering the type of high-powered devices that it can handle easily. Typically, just one of the connected storage devices will cost more than the HBA itself, which is a fair price point in our opinion. This price point creates an I/O per dollar ratio that is simply unmatched in this category.
SSDs have stressed storage subsystems like never before. With the need for increasing speed and flexibility in the enterprise space, everyone seems ready to throw flash at every performance problem. I for one am all for this approach, it does not take much for me to get excited about flash storage. With LSI leading the way with the latest generation of their storage devices fully leveraging PCIe 3.0 performance, doing that just has gotten much easier.