We have discussed some benefits of multi-host NVMe SR-IOV, or multi-root SR-IOV (MR-IOV) last time, the solution aims to improve SSD performance under virtual environments while ensuring high utilization and flexibility for the storage resources. In this blog, let’s take a closer look at the performance aspect, see how exactly our proposed NVMe MR-IOV delivers high-performance in virtual environments.
The Most common interface for disks today is probably SATA, but NVMe SSDs that leverage PCIe interface are emerging and is also quite popular today. The most obvious difference between the two interface is the throughput. The latest PCIe generation PCIe 4.0 reaches 2 GB/s per lane, and a lot of enterprise SSDs out there has PCIe x8 interface, which has 16GB/s bandwidth, whereas SATA 3.0 only reaches 6 Gb/s max. Another key difference is the latency. NVMe drives that transfer data directly through PCIe have a latency of just a few microseconds, whereas SATA SSDs have a latency in the 30 to 100 microseconds range. Clearly, for those who are looking for high performance, PCIe NVMe SSDs would be a better choice.
NVM Express, often refer to as NVMe, is a protocol standard for host to communicate with a non-volatile memory device over PCIe directly, thus it is inherently parallel and high performing. As PCIe advances, the performance that NVMe drives can deliver would become even better. The other common protocol used is AHCI, which was originally designed for HDDs and was brought over to SATA SSDs. Table 1 shows the comparison between these two standards.
Retrieved from Phison Blog. https://phisonblog.com/ahci-vs-nvme-the-future-of-ssds/
The MR-IOV solution that we proposed adopts PCIe switch technology, so even though the SSDs are disaggregated from the hosts, the data travels through PCIe all the way from host CPU to the SSDs. This approach reduces overheads due to extra data encoding/decoding process and it should fulfill the performance need at the bare metal level. Next, we will discuss how our solution secures performance on virtual machines.
There are different virtualization approaches for SSDs. There are software approaches that emulates storage disks for virtual machines to access SSDs, but they often create appreciable software overheads that limits the disk drives from their best performance. In contrast, SR-IOV is more of a hardware approach, it is achieved by the NVMe controller built into the disk drive, thus eliminates much of that software overheads. Following diagram illustrates how virtual machines could possibly interact with NVMe storage devices.
In classical virtual environment, when virtual machines (guest OS) are to communicate with the disks, say retrieving data from it, it must go through the hypervisor. The hypervisor handles all the interruptions coming back and forth, it maps the guest OS and the physical device so that the data travels the right route and lands to the right place. As a result, it consumes much CPU resources and the performance drops.
When SR-IOV is enabled, users can create multiple virtual functions (VF) that are associated to one physical function (PF), you can think of it as creating multiple clones from one SSD, so that the physical SSD can be accessed by different machines at the same time, sharing the bandwidth of the SSD. The specification of SR-IOV is developed and maintained by PCI-SIG, you could visit their site for more details.
These SSD virtual functions are the key that drives improved performance in VMs. Unlike the physical function that contains full PCIe functions, a virtual function is a lightweight PCIe device, and its purpose is only to allow data movement, therefore users would not be able to do much configuration from a VF. A VF only contains the resources given to it by the PF.
When the VFs are attached to a virtual machine, the virtual machine would recognize it as an actual PCIe device that is directly attached. As the PF already mapped certain resources to VFs, the virtual machine can now access these resources through a direct I/O path that is introduced by the VF. In addition, the interruptions that are previously handled by the hypervisor is now passed to individual virtual machines. In other words, the data goes directly to the virtual machine without having to go through hypervisor or any other layers that would otherwise increase wait time.
Falcon NVMe MR-IOV:
Our proposed MR-IOV solution extends the application of SR-IOV. It allows not only VMs, but VMs on different host machines to share a physical SSD by enabling SR-IOV independent to the host machines. Following diagram illustrates the architecture of our solution.
We also did a quick fio tests on virtual machine using Falcon NVMe chassis and Samsung PM1735 NVMe SSDs. In the test, we assigned a VF with 3 TB capacity to the host, then passed it through to the virtual machine (Ubuntu 20.04). Table 2 shows the environment used and the fio bandwidth result.