Technology content trusted in North America and globally since 1999
8,094 Reviews & Articles | 60,970 News Posts

Defining NVMe - Hands-on testing with the 1.6TB Intel P3700 SSD

By: Paul Alcorn | PCIe SSDs in IT/Datacenter | Posted: Jun 24, 2014 3:54 pm

NVMe Operation




The goal of NVMe is to simplify and reduce the driver stack as much as possible. In the past, each small step forward has focused on reducing the link transfer and platform+adapter latency, marked in green and yellow. The move to PCIe places the storage device closer to the CPU, automatically dropping platform+adapter latency from 10usec to 3usec. The blue categories are the actual speed of the hardware. Future NVM (Non-Volatile Memory) products, such as MRAM and PCM, will be up to 1000 times faster than NAND-based devices. NVMe lays the groundwork for handling the massive reduction in hardware latency, and in effect pushes latency back onto the software.




One of the key building blocks of NVMe is its simplified command set. NVMe only requires 10 administrative and 3 I/O commands. Administrative commands are less frequent than NVM I/O commands. Mandatory admin commands are for management functions, such as creating and deleting queues. There are also optional administrative commands for formatting, firmware updates, and security features.


The three mandatory NVM I/O commands control reading, writing, and flushing volatile write caches to the storage medium. There are also optional commands such as dataset management, which provides TRIM functionality. Command arbitration assigns different priority levels to manage service levels.


By comparison, after decades of development, SCSI has accumulated 170 total commands, and SATA has 8 different read commands alone. This leads to increasing complexity and is a perfect example of legacy baggage, while NVMe's spartan command set provides streamlined operation. In addition, an optional SCSI translation layer provides NVMe compatibility for those leveraging existing SCSI infrastructure.




NVMe supports parallel operations by establishing multiple queue pairs (Submission/Completion). The management queue, to the left, creates or deletes additional queues. Each new queue is assigned to its own CPU core, and these simple queues only process read, write, or flush commands.


Each queue supports up to 64,000 commands (QD), and the controller management queue can create a mind-boggling 64,000 queues. Submission and completion queues are allocated in host memory, and multiple submission queues can utilize the same completion queue.


AHCI is woefully inadequate by comparison; it only supports one queue with 32 commands. The location of the single queue on one core also severely hampers performance; operations routinely traverse multiple cores during the path to completion. NVMe supports multiple cores for maximum performance, and MSI-X, interrupt steering, and interrupt aggregation help extend scalability well beyond the capability of AHCI.






These two slides outline the steps required for command submission and processing. This streamlined process only requires one 64B fetch per 4K command; in contrast, AHCI requires two serialized host DRAM fetches. NVMe also utilizes memory reads of the submission queue and eliminates performance-killing uncacheable / MIMO register reads in the issuance or completion path.

    PRICING: You can find products similar to this one for sale below.

    United States: Find other tech and computer products like this over at Amazon's website.

    United Kingdom: Find other tech and computer products like this over at Amazon UK's website.

    Canada: Find other tech and computer products like this over at Amazon Canada's website.

    We at TweakTown openly invite the companies who provide us with review samples / who are mentioned or discussed to express their opinion of our content. If any company representative wishes to respond, we will publish the response here.

Related Tags

Got an opinion on this content? Post a comment below!