Skip to content

[Advanced Memory Systems Ⅲ ] Realizing ATS and PRI for Efficient Data Access in NVMe SSD

Written by Karthik Balan, Solution PE Architecture / SSIR
Sathyavathi M, Host Software / SSIR

  • mail

Part 3: ATS Validation and ecosystem integration

 

Kernel and host-side changes

To validate ATS, map and unmap commands must be issued to the IOMMU. Since direct access to the IOMMU from user applications is not supported, a user-space application was implemented to interact with the iommufd module, which exposes control interfaces through the /dev/iommu device node, as documented in the Linux kernel IOMMUFD subsystem.

The application passes the bus–device–function (BDF) information of the NVMe® device to the iommufd driver via ioctl calls. Within the driver, an extended ioctl handler retrieves the corresponding struct pci_dev using pci_get_domain_bus_and_slot(domain, bus, devfn). The device is then bound to the IOMMU context using the iommufd_device_bind() API and subsequently attached to a page table through iommufd_device_attach().

Once binding and attachment are completed, the application can issue map and unmap requests to invalidate or update I/O virtual address (IOVA) mappings.

IOMMUFD provides a generic user-space API for managing I/O address spaces and I/O page tables using file descriptors. It is designed to be consumable by any driver that exposes DMA to user space and offers universal support for IOMMU-backed address space management, with extensibility for hardware-specific features.

*  In this context, IOMMUFD (uppercase) refers to the kernel subsystem, while iommufd (lowercase) refers to the file descriptors created through /dev/iommu for user-space interaction.
Figure 10. User-space initiated IOMMU map and unmap workflow via iommufd
Figure 10. User-space initiated IOMMU map and unmap workflow via iommufd
Figure 10. User-space initiated IOMMU map and unmap workflow via iommufd
Figure 10. User-space initiated IOMMU map and unmap workflow via iommufd

In addition, the iommu_ioas_copy operation may be used to replicate an existing mapping from a source IO address space (src_ioas_id) to a destination IO address space (dst_ioas_id) directly for user space.

It should be noted that performing device bind and attach operations requires temporarily unbinding the NVMe device from its active IOMMU domain. During this phase, the device node (e.g., /dev/nvme0) becomes unavailable for standard I/O operations.

 

ATS validation scope and objectives

Validation of PCIe ATS focuses on verifying the correct implementation and functional behavior of ATS in an NVMe device. ATS enables a PCIe device to request virtual-to-physical address translations directly from the system IOMMU, thereby reducing address translation overhead and improving overall system performance. Because ATS operation tightly couples device, IOMMU, and operating system behavior, ecosystem-level validation is essential.

The primary objectives of ATS validation include the following:

  • Verification of ATS capability detection and correctness
  • Validation of ATR and response sequences
  • Confirmation of correct translation handling by the device under test (DUT)
  • Verification of cache invalidation and translation consistency
  • Assessment of robustness and endurance under concurrent transaction stress

A minimal validation environment is required to ensure accurate functional and performance characterization. Table 3 outlines the minimum hardware and software configuration required to enable and validate PCIe ATS functionality.

 
Table 3. Minimum system requirements for ATS validation
Table 3. Minimum system requirements for ATS validation
Table 3. Minimum system requirements for ATS validation
Table 3. Minimum system requirements for ATS validation

 

Key validation cases for ATS

Capability discovery

ATS validation begins with capability discovery. Any function that supports ATS—i.e., is capable of generating ATRs—must expose the ATS extended capability structure within its PCIe extended configuration space1.

The following items are verified:

  • Presence of the ATS extended capability in PCIe configuration space1
  • Proper advertisement of ATS support via:
  • PCI Express extended capability header
  • ATS capability register
  • ATS control register
Figure 11. ATS extended capability structure
Figure 11. ATS extended capability structure
Figure 11. ATS extended capability structure1

 

  • The PCI Express extended capability header must report:
  • Extended capability ID = 0x0F
  • Capability version = 0x01

 
Figure 12. ATS extended capability header
Figure 12. ATS extended capability header
Figure 12. ATS extended capability header1

 

Enabling ATS and STU configuration

Enabling ATS requires programming the ATS control register, including configuration of the STU.

The STU specifies the minimum translation granularity that an endpoint may request or invalidate and has the following properties:

  • Defined in units of 4 KB blocks
  • Encoded as a 5-bit power-of-two exponent in the ATS capability register
  • A value of 0 represents 1 block (4 KB)
  • A value of n represents 2ⁿ × 4 KB
  • The minimum STU is always 4 KB, aligned with the system page size
  • Larger translation or invalidation requests must be multiples of this base unit
  • Maximum supported granularity depends on the implementation limits

Additional control attributes include:

  • ATS memory attributes default (AMAD): When set, applies default memory attributes as a performance optimization
  • ATS memory attributes enable (AMAE): When set, allows the requester to supply non-default memory attributes using the TLP processing hint (TPH)* prefix in the transaction layer packet (TLP).
  • *  TPH is a PCIe mechanism that allows a device to provide hints about where a transaction should be processed to improve cache locality. TLP is the fundamental packet format used for data transfer in PCIe.
  • Enable (E): When set, permits the function to cache address translations
Figure 13. ATS control register
Figure 13. ATS control register
Figure 13. ATS control register1

 

Functional validation of ATS operation

Functional validation of an ATS-capable NVMe endpoint includes the following categories:

  • Translation request and response validation
  • Initiate ATRs from the device
  • Trigger DMA access to a virtual address
  • Confirm that translation requests are forwarded to the host IOMMU
  • Verify correctness of returned physical addresses
  • Translation caching validation
  • Confirm that translated addresses are cached in the device ATC
  • Verify reuse of cached translations without repeated ATRs
  • Trigger cache invalidation and observe replacement behavior
  • Invalidation handling validation
  • Generate system-initiated invalidation requests
  • Simulate host page table updates
  • Confirm that the device receives invalidations and purges stale entries
  • Negative and error-handling validation
  • Inject malformed or unauthorized translation requests
  • Verify host-side rejection or fault responses
  • Confirm graceful error handling by the device

 

 

Conclusion

The implementation of ATS and PRI in NVMe SSDs enables more efficient and scalable I/O behavior in virtualized environments by allowing devices to participate directly in address translation and page management. This reduces reliance on pinned memory, minimizes IOMMU bottlenecks, and improves overall system memory utilization and I/O efficiency.

This article has outlined the fundamental mechanisms of ATS and PRI, along with key validation considerations based on emulation-based analysis. As heterogeneous computing and large-scale virtualization continue to expand, technologies such as ATS and PRI are expected to play an increasingly important role in enabling high-performance, flexible, and memory-efficient storage and accelerator architectures.

 


 

References

[1] “CXL Ecosystem Innovation Leveraging QEMU-based Emulation” 
 
[2] “IOMMUFD — The Linux Kernel documentation”
 
[3] “PCI Express 6.0 Specification”
 
[4] “AMD I/O Virtualization Technology (IOMMU) Specification”
 
 

 
 
* The contents of this blog are provided for informational purposes only. No representation or warranty (whether express or implied) is made by Samsung or any of its affiliates and their respective officers, advisers, agents, or employees (collectively, "Samsung") as to the accuracy, reasonableness or completeness of the information, statements, opinions, or matters contained in this blog, and they are provided on an "AS-IS" basis. Samsung will not be responsible for any damages arising out of the use of, or otherwise relating to, the contents of this blog. Nothing in this blog grants you any license or rights in or to information, materials, or contents provided in this blog, or any other intellectual property.
 
* The contents of this blog may also include forward-looking statements. Forward-looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or the industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this blog.
 
Explore more
episodes