Skip to content

Standardizing NVMe SSD Virtualization: Introducing PCIe Exported NVM Subsystem Migration Feature

Written by Nicole Ross, SSD Architect
Klaus Jensen, Principal Engineer

  • mail

The demands of high-availability data centers and AI-driven infrastructure are pushing the boundaries for virtualization architecture. Driven by the need for greater flexibility and scalability, the storage device industry is evolving beyond traditional vendor-specific approaches to managing storage and compute resources.

To support virtualization at the storage layer, NVM Express® (NVMe®) technology introduces new standardized virtual objects (exported) and redefines how virtual storage is created, managed, and moved with TP4193: PCIe Exported NVM Subsystem Migration.

Samsung Semiconductor is proud to have contributed to the ratification of TP4193 in collaboration with key industry stakeholders such as Google, ensuring the standard effectively addresses the limitations of software-based virtualization.

Continue reading to learn more about SSD virtualization before and after TP4193 and why continuous collaborative innovation is critical, especially as the requirements for AI infrastructure flexibility and reliability become increasingly important.

 

Storage virtualization model

Today, most storage virtualization is implemented above the SSD storage device, typically in the hypervisor or host software. While this gives flexibility, it comes at a cost:

  • Hypervisor complexity has grown, as it must manage namespace mapping, isolation, migration, and admin command trapping;
  • Migration is complex and fragile, requiring coordination across multiple layers and often relying on proprietary tooling; and
  • Latency increases, as IO paths grow longer and hardware control has a separate path for trapping commands.

As workloads become more dynamic, especially AI workloads tied to GPUs, these inefficiencies become more visible. Storage abstraction schemes must make tenant-oriented data easier to relocate and more isolated, as well as become easier to orchestrate without constant software intervention.

 

SSD virtualization before TP4193

In the conventional model, SSD storage areas are allocated to virtual machines using a direct assignment mechanism ― typically implemented in the form of logical partitioning at the PCI Express® (PCIe®) level. This can be realized either by the SSD exposing several PCIe physical functions (PFs) or, more recently, through the Single-Root I/O Virtualization (SR-IOV) Extended Capability in which the SSD is logically partitioned into a single physical function and several virtual functions (VFs). To a host, the NVMe controller interface presented by a VF is indistinguishable from one presented by a PF.

Direct assignment allows the virtual machine manager (VMM) to map the PCI configuration space of the VF into the address space of the VM, which allows the VM to access the PCIe function as if it were locally attached. Host software running within the VM is oblivious to the fact that the PCIe function is really provided by an underlying host.

However, this direct assignment scheme has a number of limitations and drawbacks. Because each NVMe register is passed into the VM as-is, the VM will also observe the SSD as-is. For example, if the physical SSD is a Samsung device, it will be identified as such by the VM. Similarly, all other properties of the SSD are exposed directly to the VM, including unique identifiers. Worse, because of how NVM subsystems work, the direct-assigned controller will leak information about the NVM subsystem such as other controllers and namespaces, even if those are not directly accessible by the VM.

This is obviously not optimal, so VMMs generally take steps to modify the information of the NVMe controller and present a different face (effectively lie) to the VM. There are several strategies for achieving this.

One strategy relies on the VMM trapping access to the PCI configuration space and NVMe controller registers. In this context, trapping refers to the VMM hypervisor being configured to intercept any VM access (i.e., memory-mapped I/O). In the industry, this is also known as “trap-and-emulate.” This trick allows the VMM to mediate access to the direct-assigned function and modify the information as needed. The VMM can intercept any commands issued by the VM to submission queues and then handles them in one of two ways:

  • Emulate the command entirely within the VMM by preparing the data buffer and writing the completion queue entry (CQE) directly into the VM memory; or
  • Modify the submission queue entry (SQE) and pass it on to the assigned function.

VMM interception of administrative commands such as Identify allow it to hide the identity, features, and capabilities of the SSD and present them in a different way. Moreover, by being able to modify SQEs, it is possible to remap identifiers within the command such as a namespace identifier.

While this method works, it can be quite costly in terms of cycles spent on the VMM host to do this job. Even though trapping admin commands generally does not affect performance, intercepting and mediating all commands can have a massive impact on I/O performance and defeats the purpose of direct assignment as a method that maintains near-native performance.

 

SSD virtualization using NVMe PCIe exported NVM subsystem migration feature

With the ratification of a recent NVMe technical proposal, NVM subsystems may now provide the VMM with the necessary capabilities to rely on virtualization directly within the SSD. Building on previous contributions to the NVM Express community and further enabling the vision of scalable virtualization, NVM Express addresses the drawbacks to software-based virtualization with TP4193: PCIe Exported NVM Subsystem Migration.

Virtualization of storage through the mechanisms of the PCIe Exported NVM Subsystem Migration feature changes the model entirely. Instead of the host adding abstractions on top of a raw SSD, the SSD itself exposes logical, virtualized storage constructs with definable behavior. The host becomes an “orchestrator” rather than an implementer.

This standardization effort makes it possible to offload the aforementioned complexity from the hypervisor, reduce reliance on vendor-specific implementations, and introduce an interoperable mechanism for SSD virtualization.

Used alongside other PCIe and NVMe functionalities, the new features complete a mechanism that:

  • Defines scalable and migratable virtual SSDs;
  • Normalizes virtualized SSD topology and attributes, allowing for unnoticed SSD migration from the VM (i.e., the migrating VM perceives no change to the underlying HW);
  • Slims the hypervisor, letting the SSD manage its own resources and attributes and enabling direct VM access to admin queues to reduce latency;
  • Provides secure isolation for multiple GPU attach points in AI/DAS environments; and
  • Proposes a vendor-neutral, standardized definition for virtual SSD configuration and migratable state.

Simply stated, the PCIe Exported NVM Subsystem Migration feature introduces two distinct self-contained functionalities:

  • Standardized virtual (exported) object creation: Establishes a common framework for creating and managing exported objects; and
  • Reporting and capability masking: Defines how a virtual SSD presents, reports, and masks its capabilities and attributes to appear as an independent physical device to the host.

 

NVMe technology evolution

With TP4193 and other recent technical proposals, NVMe technology is no longer just a storage protocol; it is evolving into a foundation for composable, migratable, and secure infrastructure.

Evolution of virtualization within NVMe technology substantially enhances its capabilities, such that:

  • Storage becomes portable, not static;
  • Hypervisors become simpler, not heavier;
  • Migrations become routine, not risky; and
  • Policy is enforced by hardware, not by software alone.

This evolution is critical for the next generation of AI platforms, GPU accelerated systems, and multi-tenant data centers, where flexibility and reliability are non-negotiable. Virtualization at the NVMe layer is not just an optimization; it is a structural shift on how storage is designed, deployed, and trusted.

 

Learn more

Learn more about the PCIe Exported NVM Subsystem Migration feature and SSD virtualization by reading NVM Express’s blog on Standardizing SSD Virtualization: NVMe Technology Building Blocks[1].

 


 
References
 
[1] Standardizing SSD Virtualization: NVMe® Technology Building Blocks
 

* The contents of this blog are provided for informational purposes only. No representation or warranty (whether express or implied) is made by Samsung or any of its affiliates and their respective officers, advisers, agents, or employees (collectively, "Samsung") as to the accuracy, reasonableness or completeness of the information, statements, opinions, or matters contained in this blog, and they are provided on an "AS-IS" basis.
* Samsung will not be responsible for any damages arising out of the use of, or otherwise relating to, the contents of this blog. Nothing in this blog grants you any license or rights in or to information, materials, or contents provided in this blog, or any other intellectual property.
* The contents of this blog may also include forward-looking statements. Forward-looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or the industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this blog.
* NVM Express® design mark and NVMe® word mark are trademarks of NVM Express, Inc.
* PCI Express® and PCIe® are registered trademarks of PCI SIG.