
The Bottom Line
Pros
- + Low QD performance
- + Efficiency
- + Consistency
Cons
- - None
Should you buy it?
AvoidConsiderShortlistBuyIntroduction and Drive Details
ScaleFlux is looking to shake up the enterprise storage industry by taking a new approach to a bygone solution for increasing both storage performance and data density per gigabyte. Years ago, the controller company, SandForce, pioneered the technology of leveraging compression as a means to an end, with that end being more performance and greater data density per gigabyte than its contemporaries.
For a time, SandForce reigned supreme as the purveyor of the world's best performing, highest enduring, and highest data density per gigabyte SSDs on the planet. Well, that was back in the days when SATA 600 was the fastest interface going. Back then, employing SandForce's compression technology at scale had a side effect that ultimately spelled its doom, over-taxing host resources.
Traditionally, and just as it was in the era of SandForce, data compression relies on the host system for computational resources, ultimately making traditional data compression as a means to an end, less efficient overall than not. This is why data compression as it relates to storage at scale has mostly been relegated to the dustbin of history.
Enter ScaleFlux and computational storage. Computational storage is just what the name implies in that what are traditionally host resource reliant operations are moved to the storage device itself. In this case we are talking about on device data compression and all the inherent advantages that come with compression.
The SSD we have in the lab for testing today is ScaleFlux's second-generation computational storage SSD, the CSD-3310 7.68TB. The CSD-3310 is a PCIe Gen4 x4 SSD with integrated data compression and decompression engines, which the company claims can quadruple capacity and double performance. At the heart of the device is ScaleFlux's custom SFX 3000 storage processor built with ARM processors and dedicated hardware acceleration engines.
ScaleFlux describes its 3000 series SSDs as follows: "ScaleFlux 3000-series SSDs deliver the highest performance available while offloading storage processing from the CPU by embedding dedicated computational storage engines to the drive". "With two flavors available, the NSD 3000 is an easy-to-deploy, high-performance SSD accelerating data-hungry workloads by using built-in transparent compression and encryption. The CSD 3000 adds the Capacity Multiplier feature, enabling users to store more data per gigabyte of NAND media".
Okay, now that we have a basic understanding of what makes our CSD-3310 tick and how it differs from all other SSDs, we need to quickly go over how we can put its compression/decompression engine to the test. Naturally, in order for data to be compressed, it must be compressible, to begin with. Well, we, like everyone else in the industry, test with non-compressible data, so this presents a bit of a problem, but we have a solution.
Mixed workloads consisting of data that can be compressed are what the CSD-3310 is made for. ScaleFlux claims that its 7.68TB CSD-3310 can sustain a 4K 70/30 2:1 compressible data workload at 1 million IOPS. If we can even get anywhere close to this, it will, in our opinion, validate the CSD-3310's claim to fame. With this in mind, and because 4K 70/30 is a commonly quoted performance metric, we have decided to integrate this workload into our testing regimen going forward, and in this case, we will additionally make the workload roughly 2:1 compressible for our test subject.
Specs/Comparison Products

ScaleFlux CSD-3310 7.68TB NVMe PCIe Gen4 x4 U.3 SSD


Enterprise Testing Methodology
TweakTown strictly adheres to industry-accepted Enterprise Solid State Storage testing procedures. Each test we perform repeats the same sequence of the following four steps:
- Secure Erase SSD
- Write the entire capacity of SSD a minimum of 2x with 128KB sequential write data, seamlessly transition to the next step
- Precondition SSD at maximum QD measured (QD32 for SATA, QD256 for PCIe) with the test-specific workload for a sufficient amount of time to reach a constant steady-state, seamlessly transition to the next step
- Run test-specific workload for 5-minutes at each measured Queue Depth, and record results

Benchmarks - Random and Sequential Performance
4K Random Write/Read

We precondition the drive for 16,000 seconds, receiving performance data every second. We plot this data to observe the test subject's descent into steady-state.
Steady-state is achieved at 11,000 seconds of preconditioning. The average steady-state write performance at QD256 is approximately 172K IOPS. The tight pattern with virtually no outliers indicates high QoS.


The CSD-3310 impresses mightily at QD1-2, returning the best we've recorded to date for any 1-DWPD class flash-based SSD and doing so at arguably the most important queue depths. We hit a max steady-state number of 174,659 at QD16. Factory spec here is 168K IOPS.


Very strong random read performance across the board is what we've come to expect from Micron B47R arrayed SSDs. We achieved exactly 1.24 million IOPS at QD256, which ScaleFlux indicated should be the expected high for a non-compressible 4K full random read workload.
8K Random Write/Read

We precondition the drive for 16,000 seconds, receiving performance data every second. We plot this data to observe the test subject's descent into steady-state.
Steady-state is achieved at 7,000 seconds of preconditioning. The average steady-state write performance at QD256 is approximately 88K IOPS. The extremely tight pattern with virtually no outliers indicates high QoS.


We expect 8K random to track exactly the same as 4K random, just at a lower rate because it's moving twice the amount of data. Again, the CSD-3310 excels at QD1 and again delivers a lab best for a 1-DWPD flash-based SSD.


The drive continues to exhibit extraordinarily low queue depth random read performance, essentially leading the field at queue depths of up to 4.
128K Sequential Write/Read

We precondition the drive for 6,500 seconds, receiving performance data every second. Steady-state for this test kicks in at 0 seconds. The average steady-state sequential write performance at QD256 is approximately 4,075 MB/s.


Full speed at QD1 is exactly what we want to see from any SSD. In terms of max throughput, we fall 75MB/s short of quoted up to specs.


Better than advertised sequential throughput here and more impressively, the highest QD1 throughput we've recorded for this test.
Benchmarks - Workloads
4K 7030
4K 7030 is a commonly quoted workload performance metric for enterprise SSDs.

We precondition the drive for 16,000 seconds, receiving performance data every second. We plot this data to observe the test subject's descent into steady-state.
Steady-State is achieved at approximately 8,000 seconds of preconditioning. The average steady-state workload performance at QD256 is approximately 469K IOPS. Our data pattern indicates extremely good QoS, among the best we've seen to date for a flash-based SSD.


As previously stated, this is where we are going to singularly diverge from our normal non-compressible data workload and additionally show what the CSD-3310 can do when it's being fed compressible data at roughly a 2:1 ratio. Now this is impressive! When the data is compressible at a high enough ratio, the CSD-3310 delivers DOUBLE the performance as when data is non-compressible and does so without stealing host resources.
At 935K max IOPS, we fall a bit short of the 1 million IOPS quoted, but we are no less impressed with what this computational storage device can deliver. That performance curve is astonishing as we see it and does validate ScaleFlux's sales pitch for its computational storage advantage.
Email Server
Our Email Server workload is a demanding 8K test with a 50 percent R/W distribution. This application gives a good indication of how well a drive will perform in a write-heavy workload environment.

We precondition the drive for 16,000 seconds, receiving performance data every second. We plot this data to observe the test subject's descent into steady-state.
Steady-State is achieved at approximately 7,000 seconds of preconditioning. The average steady-state workload performance at QD256 is approximately 155K IOPS. Our data pattern indicates extremely good QoS, among the best we've seen to date for a flash-based SSD.


The CSD-3310 continues to impress at low queue depths, which is exactly where we like to see impressive performance most of all.
OLTP/Database Server
Our On-Line Transaction Processing (OLTP) / Database workload is a demanding 8K test with a 66/33 percent R/W distribution. OLTP is the online processing of financial transactions and high-frequency trading.

We precondition the drive for 16,000 seconds, receiving performance data every second. We plot this data to observe the test subject's descent into steady-state.
Steady-state is achieved at 7,000 seconds of preconditioning. The average steady-state workload performance at QD256 is roughly 206K IOPS. Again, the data pattern indicates exceptional QoS.


Again, the CSD-3310 continues to impress at low queue depths, which again is exactly where we like to see impressive performance most of all.
Web Server
Our Web Server workload is a pure random read test with a wide range of file sizes, ranging from 512B to 512KB at varying percentage rates per file size.

We precondition the drive for 16,000 seconds, receiving performance data every second. We plot this data to observe the test subject's descent into steady-state.
We precondition for this test with an inverted (all-write) workload so no relevant information can be gleaned from this preconditioning other than verification of steady-state.


It could be said that this test, in particular, is the most demanding we run. Look at that performance curve! Wow! At queue depths of up to 32, the ScaleFlux CSD-3310 is the second-best performing flash-based SSD we've tested to date against this pure random workload. Outstanding.
Final Thoughts
Even when running our normal full entropy, non-compressible steady state testing methodology, the ScaleFlux CSD-3310 delivered some outstanding performances. The common denominator over our entire test suite coming from the CSD-3310 is exceptional low queue depth performance or performance where it matters most. On those grounds alone, we can justify the drive being one of the better choices in its class.

When we get into the wheelhouse of what the CSD-3310 is designed for, as we did in a singular instance of a 4K 7030 random workload with roughly 2:1 compression, the CSD-3310 came into its own. With this roughly 2:1 compressible workload, the CSD-3310 doubled its performance and set a new lab record for the best performance curve and highest throughput at a massive 935K IOPS. Editor's Choice.