Synthetic Performance Testing
We used an ASRock Fatal1ty X299 Professional Gaming i9 XE motherboard paired with an Intel Core i7-9800X processor for this review. This is a different system than we use for our consumer SSD reviews. The X299 platform supports VROC, and our Z270 systems do not.
We also used an ASRock Ultra Quad M.2 add-in card ($199.99, Amazon) to get all four drives in a single PCIe 3.0 x16 slot. Several companies built carrier cards for VROC and prices range between manufacturers. Recently we found the ASUS Hyper M.2 x16 NVMe card available for less than $50 at Amazon. The same card currently sells for $109.99 at Newegg. The manufacturers didn't add proprietary hardware to lock out other manufacturers carrier cards so you can save a little by shopping around.
The add-in cards are not blocking meaning they allow the full bandwidth of each NVMe SSD to pass thru to the motherboard. With four drives that is 128Gbps (32Gbps x4) but it does use all 16 PCIe lanes so you have to install the card in a full-length slot and configure the BIOS for bifurcation (setting the data lanes to x4, x4, x4, x4 instead of a single x16). Your motherboard manual will walk you through the settings since each board is different.
We tested both the 1TB and 256GB SM961 SSDs. We used two configurations. The first is in VROC with the array set to RAID 0. Most use this traditional VROC configuration with this hardware. We also tested an "optimized" configuration with each drive set to JBOD (as a single drive with a drive letter assigned to each in Windows). The optimized configuration runs a separate workload on each SSD. Some users may choose to configure the system like this when reading from one or two drives and writing to other drives. This configuration would be ideal for video editors reading several clips from different drives and writing the data back to another drive.
For comparison, we included a single Intel DC P4510 2TB SSD in the charts.
Sequential Read Performance
Intel's VROC does a good job of accelerating some workloads but not all. Flash is fast because it reads and writes across several chips at one time. This is like a RAID array inside each SSD. Low queue depths read and write to less chips at one time, but performance increases at the drive level as we increase the workload through queue depths.
Adding drives in a RAID array decrease the reads and writes to individual SSDs because the data spreads across drives. In an array, it can take longer to ramp up performance over a single drive due to this and increased data path latency from the VROC arrangement. You do get higher peak numbers, often found at higher queue depths, using RAID. These general assumptions are workload specific though.
The VROC arrays do a very good job accelerating large block size sequential workloads. The optimized configurations can provide even more performance but require a very specialized workload.
Sequential Write Performance
The Samsung SM961 SSDs uses predictive TRIM, and that causes some issues when looking for consistent performance. This doesn't change the low-end of performance. It does change the peak write performance and make it more difficult to get the drives in a steady-state condition.
Sequential Mixed Workload Performance
You can see the variability on the first mixed workload chart in the 40%, and 30% read tests. The performance starts very high but slowly drops off at the end of each workload. The drives will fight aggressively to stay out of steady-state, and that can increase performance, especially when running burst workloads or smaller write periods.
Random Read Performance
VROC adds some measurable latency, and that's clear in the random read tests at low queue depths. The Intel DC P4510 doesn't deliver comparable random read low-queue depth performance as the SM961 since it's designed for enterprise workloads. In VROC, the SM961 still manages to outperform the single DC P4510, but we would see a much larger performance decrease compared to a single SM961 due to the VROC latency at QD1.
Random Write Performance
The 1TB SM961 is faster than the 256GB model due to the parallelism of flash, reading, and writing to more flash at the same time. The differences magnify as you add more drives to an array. The one thing that stood out in our 4KB random write test is now much more consistent the 256GB VROC results compared to the same configuration with the 1TB models at eight OIO and above.
Random Mixed Workload Performance
The mixed random test shows beautiful performance out of both SM961 capacity VROC configurations. This really shows that VROC is capable of delivering better performance over JBOD configurations even with an optimized workload.