Skip to content

What Hyperscalers Need to Know About Flexible Data Placement (FDP)

By Javier Gonzalez

  • mail
Over the past decade, the storage industry has struggled to embrace new data placement technologies. It’s been a topic of particular interest to datacenters and hyperscalers because of the potential impact on TCO for SSD deployments. And while major storage technologies like Open-Channel SSDs, NVMe Streams (Directives), and Zoned Named Spaces (ZNS) all have valid use cases and a wide base of engaged customers, these solutions can cause fragmentation within the software ecosystem leading to bloated codebases and a general rejection in mainline projects. FDP, or Flexible Data Placement, is a newly ratified NVMe specification (TP41461) that aims to simplify ecosystem integration. The new standard was driven by Meta, Google, and Samsung, who aligned efforts to fast-track the FDP spec, getting it finalized in just 6 months. The buzz around FDP has been steadily growing because it is backwards compatible with legacy systems and requires far less engineering resources while delivering impressive optimizations. FDP accomplishes this by targeting the first 80% of host/device cooperation for better data placement, where any required changes to existing software stacks are not overly intrusive. For Meta and Google, two of the world’s biggest hyperscalers, FDP addresses concern over ecosystem complexity with existing storage technologies like ZNS. Data placement in an FDP-enabled SSD improves upon today’s disaggregated storage model by enabling data segregation through ‘hints’ provided to the host about where to allocate data on the SSD. FDP-enabled SSDs accomplish this by exposing superblock information which allows the host to tag writes to specific Reclaim Units (RU) so that the device can align data from multiple applications. This host/device cooperation reduces write amplification (WA), the incidental data that is created when data is written by the host to the media, and provides guarantees for de-allocating RUs which gives the host power to orchestrate garbage collection (GC). FDP also maintains a feedback loop with the host to evaluate how well data alignment among RUs and Reclaim Groups (RGs) is working.
FDP configuration consists of one or more RUs organized into one or more RGs with one or more Reclaim Unit Handles (RUHs). Each RUH references a RU within an RG. An overview of how FDP works and its potential TCO impact for hyperscalers including links to additional resources can be found in a previous Tech Blog article entitled Hyperscalers Embrace Flexible Data Placement (FDP) to Increase Performance and Lower TCO, by Samsung’s Senior Director of Product Planning, Mike Allison. Currently, there’s nothing on the horizon with the potential to become a mainstream industry solution to data placement as promising as FDP. Support for FDP is already available in operating systems, libraries and tools, and applications, so hyperscalers and datacenters can expect to adopt FDP-enabled SSDs with relative ease when they become available. Let’s look at the current state of support for FDP within the storage stack: Linux Kernel I/O In Linux 6.2 operating systems, FDP is implemented using I/O Passthru without requiring any changes to the kernel. Linux 6.2 offers an asynchronous path to the NVMe generic character device that allows user-space applications to directly access the Kernel NVMe driver, bypassing the block layer. This means that applications can leverage an end-to-end architecture similar to SPDK, but using the in-kernel NVMe driver instead. Earlier versions of Linux (5.17-6.1) also support FDP using I/O Passthru, supported by io_uring. In Linux OSes, the kernel I/O subsystem handles caching, scheduling, spooling, device reservation, error handling, and provides device independence, resource management, and concurrency management to the I/O devices. Libraries & Tools Both xNVMe2 and SPDK3 provide full support for FDP. xNVMe enables FDP in different storage paths by leveraging the SPDK NVMe driver and the I/O Passthru, io_uring. The ability to connect FDP-enabled SSDs with standard storage interfaces makes adoption easy for application developers and storage infrastructure architects. xNVMe provides the means to program and interact with NVMe devices from user space. The foundation of xNVMe is libxnvme, a user space library for working with NVMe devices. It provides a C API for memory management, that is, for allocating physical/DMA transferable memory when needed. The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. Applications Currently, an industry-wide initiative to add support for FDP in Cachelib4 is underway. Testing has already shown that FDP helps Cachelib to reduce Write Amplification Factor (WAF) by segregating data from its two object pools: BigHash and BlockCache5 , without major optimizations. Based on experimental results that use real hyperscale workloads, a significant reduction in WAF can be achieved while increasing SSD utilization, with no impact on cache hit-rate.
Moving forward, upstream support for Cachelib needs to be fully developed, optimizations for better data segregation continued, and customers need to specify tolerated levels of WAF. CacheLib is a general-purpose caching engine that facilitates the easy development, scaling and maintenance of high performing caches. QEMU Emulation FDP is fully supported as of QEMU version 8.0. Using QEMU, operating system and application developers are free to develop in advance of FDP-enabled SSDs becoming widely available. Note that recent changes to the NVMe device allows for host performance evaluation using QEMU emulation. QEMU is an open source machine emulator and virtualizer which allows developers to run a complete unmodified operating system on top of an existing system. Version 8.0 was released in April, 2023.
With the arrival of FDP, hyperscalers will be better equipped to handle increasingly heavy workloads from AI, cloud, and media rich content. Integration into existing ecosystems will require minimal engineering in order to leverage optimizations thanks to FDP’s backwards compatibility and support for FDP in Linux Kernel I/O, xNVMe and SPDK, and soon, Cachelib. The benefits of FDP have already been clearly demonstrated by Meta, Google, and Samsung and what remains now is for SSD-enabled SSDs to become widely available. The storage industry appears to be on the verge of something big — a single, standardized data placement technology that can be adopted with relative ease, with big payoffs like increased SSD utilization, improved energy efficiency, and lower TCO.
1 “xNVMe: Cross-Platform Libraries and Tools for NVMe Devices.” GitHub, 29 Aug. 2023, github.com/OpenMPDK/xNVMe. Accessed 29 Aug. 2023. 2 “xNVMe: Cross-Platform Libraries and Tools for NVMe Devices.” GitHub, 29 Aug. 2023, github.com/OpenMPDK/xNVMe. 3 “Storage Performance Development Kit.” GitHub, 29 Aug. 2023, github.com/spdk/spdk. 4 “CacheLib.” GitHub, 29 Aug. 2023, github.com/facebook/CacheLib. Accessed 29 Aug. 2023 5 Berg, Benjamin, et al. This Paper Is Included in the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation Open Access to the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation Is Sponsored by USENIX the CacheLib Caching Engine: Design and Experiences at Scale the CacheLib Caching Engine: Design and Experiences at Scale. 2020.