Exploring Maximum Cache Performance
With this being the first enterprise-class SSHD in the wild, we step through several tests to explore caching characteristics.
Our first task is to determine the maximum speed of the AMT caching engine under the best of circumstances. We test with 4K random read and write data on only 1% of the drive. Roughly 5% of the SSHDs capacity is mirrored with NAND cache to provide acceleration, and only accessing a small percentage of the NAND allows us to tease out the maximum performance of the Marvell controller and the Samsung NAND.
The massive read speed acceleration nearly jumps off the chart in this test, from 900 IOPS to a staggering 8,800 IOPS. Using a small amount of data assures that these results are coming almost entirely from the cache. This highlights that the controller effectively caches requested data and can pick out hot data under favorable conditions rather quickly. It took only three minutes for the algorithms to identify the hot data, and another 16 minutes to transfer the hot data into the NAND cache.
Operating under the assumption that all of the data is transferred to the NAND, we can deduce the write speed of the NAND buffer. We are reading from 1% of the 556 GB's of user-addressable space (5.56GB). With 5.56 GB of data requiring 16 minutes to transfer into cache, we come up with roughly 355 MB/s of write throughput to the single NAND package. This is an impressive amount of throughput from one package of NAND. This nearly saturates the maximum speed of Toggle 2.0 (400 MB/s), and will allow for fast adjustments as hot data changes. This also confirms the sequential transfer of cached data to the NAND. Random write speeds in that range simply aren't possible with a single NAND package.
We also observe a lack of acceleration for 4K random write data. With extended periods of writing, we were unable to trigger acceleration when the data was passed to the platters. These results do not indicate an entire lack of caching. It is possible the data is being partially cached and sequentialized in the DRAM caching buffer, but we are sure that the drive isn't committing the writes to the NAND buffer. While this would be beneficial, it would also lead to excess wear and longevity concerns with the eMLC NAND. It is important to bear in mind that random writes incur more wear than on NAND than sequential writes.
The same tests with sequential data yields no acceleration for read or write data. Both tests are evenly matched and the results overlap. These are expected results, as HDD's perform very well with sequential data and acceleration would merely add undue wear on the NAND. Sequential writes are better served going to the platters, which have no endurance concerns.
Our exploration revealed that only random read data is actively cached to the NAND buffer. This allows us to focus on the performance of the SSHD with various scenarios. Here we read from 5% of the user-addressable space, which would fill the entire cache buffer. In this test, the algorithms begin to promote data to the cache immediately. This results in data acceleration that averages 2,700 IOPS, though we record interspersed periods with much higher performance. This lower result could be the result of the data not being entirely cached, or slower speeds when the NAND is entirely full.
We experience a drastic increase in speed when we test with small LBA ranges and only one stream of data, so the task falls to us to attempt to trick the AMT algorithms and simulate a more distributed workload with several data streams.
We created a complex multi-segmented test pattern with multiple data streams to test the efficacy of the AMT algorithms. We test 4K read data with three data streams. The first addresses the same 5% of the drive we tested above, but only receives 70% of the workload. The second data stream reads from a larger 30% chunk of the LBA range with 15% of the workload, and finally the third data stream reads from the entire capacity of the drive with the remaining 15% of the workload.
This effectively forces the drive to ignore the 'less desirable' read access and cache only the most relevant data. We observe from the results that the SSHD still performs admirably, with an increase from 600 IOPS to 1,700 IOPS. While this lower level of performance may seem disappointing to some, the real challenge would be to attempt to draw anything even remotely near 1,700 IOPS on a standard HDD. It simply is not possible. We note a sprinkling of much higher performance, up to 1,900 IOPS, simply unheard-of performance from a HDD.
We conduct our testing outside of the file system for numerous reasons. File systems are inefficient and bring forces beyond our control into the equation, such as metadata, buffers and caches. For the purposes of the SSHD review, however, the file system can also introduce locality from the file system metadata. In many cases, metadata is a primary bottleneck during typical use.
With the same 5% test above, we topped out at 2,700 IOPS. With a file system, we top out at 6,800 IOPS. This is an impressive result, but we can also observe some of the negative effects from testing with a file system with this granularity. Observing the errata at the beginning of the test reminds us of the system caching and buffers brought on by NTFS. These results should be taken with a grain of salt, but the effects of accelerating metadata does result in a big boost in performance.