Skip to content

A Brief History of Data Placement Technologies

Written by Roshan R Nair, Global Open-ecoSystem Team (GOST)

  • mail

SSD Data placement is an important topic to both academia and industry. Data placement technologies enable the host to have greater control over where its data is placed in the SSD. By granting the host system increased control, data placement technologies aim to: i) Reduce SSD write amplification ii) Improve Quality of Service (QoS) iii) Reduce host overprovisioning iv) Optimize for total cost of ownership, amongst other goals. The past decade has seen numerous data placement technologies proposed using various levels of host-device collaboration. In this blog post, we outline the challenges that necessitated the introduction of data placement technologies and take a look at their complex history. We will unearth this history by taking a trip down memory lane and describing the various data placement proposals along with their reception and evolution.

 

 

1. Why are SSD data placement technologies needed?

To understand the need for data placement technologies, it is essential first to understand how SSDs work and the challenges that arise from their function.

 

SSD Basics

SSDs are made using NAND cells, which can be programmed to store data. SSDs that use single-bit NAND cells to store data are called single-level cell (SLC) SSDs. Similarly, SSDs using NAND cells that store 3-bits or 4-bits of data are called triple-level cell (TLC) or quad-level cell (QLC) SSDs, respectively. An SSD contains multiple NAND packages organized into dies, planes, blocks, and pages (Figure 1). Conventional NAND SSDs (CNS) typically have a single open block, within which a free page is programmed with the host’s data. The host issues read and write commands to the SSD in terms of logical block addresses (LBAs). NAND SSDs cannot be overwritten directly because they follow the erase-before-write principle. Erase operations in NAND SSDs occur in terms of erase blocks (typically tens to hundreds of MBs in size), whereas the write operations occur in terms of pages (typically 4KB, 16KB, etc. in size). This mismatch in erase and write granularities necessitates significant management as the host issuing an overwrite for the same LBA does not result in the SSD writing data to the same physical address internally. The Flash Translation Layer (FTL) in SSDs take care of this management using multiple steps:

 

  • Write the new data to an available free page in the open SSD block

  • Invalidate the old page, which had the previously written data

  • Update the FTL metadata information to point to the new page. The metadata here refers to a mapping that translates the logical addresses used by the host to physical addresses in the NAND media (L2P Mapping).

 

Figure 1. NAND Flash layout, illustrating a die with multiple planes, blocks and pages.
Figure 1. NAND Flash layout, illustrating a die with multiple planes, blocks and pages.
Figure 1. NAND Flash layout, illustrating a die with multiple planes, blocks and pages.

 

SSD Write Amplification

Freeing up erase blocks requires moving valid data pages to other blocks. This process is called garbage collection, and it results in the SSD performing extra reads and writes to relocate valid data. The extra writes that occur due to garage collection result in “write amplification.” The ratio of Total SSD writes to Total host writes quantifies the overhead of garbage collection and is termed SSD Write Amplification Factor (WAF). The ideal value of SSD WAF is 1, indicating that the SSD didn’t need to perform garbage collection. Host write patterns that are sequential result in an SSD WAF of 1 because entire SSD erase blocks get invalidated, avoiding garbage collection. A random host write pattern leads to a mix of valid and invalid pages and results in a high SSD WAF due to garbage collection. NAND cells have an intrinsic number of program/erase (P/E) cycles, after which they start to wear out and fail. An SSD WAF of >1 is non-ideal because garbage collection uses up the limited P/E cycles and leads to SSDs failing faster. Garbage collection also results in host I/O operations incurring an additional latency, leading to degraded performance and QoS. Therefore, the key to obtaining sustained and deterministic host performance and preventing pre-mature SSD failure is to minimize SSD WAF and garbage collection.

 

 

2. SSD data placement technologies

SSD WAF is influenced by the host’s write patterns and associated data characteristics or overwrite rates (temperatures). Conventional SSDs write host data with varying temperatures into a single open block, resulting in inefficient overwrite patterns and increased SSD WAF. Without host input, controlling SSD WAF is challenging as SSDs are unaware of host write patterns. SSD data placement technologies address the SSD WAF challenge by introducing varying levels of host input (Figure 2.) to perform data placement and minimize garbage collection.

Now that the challenges with using SSDs and the need for data placement technologies are clear, we use the rest of the blog post to discuss the various attempts at SSD data placement. For each technology discussed, we add a summary section focusing on five key themes:

 

  1. Host Assistance: What is the level of host assistance used in the technology? 

  2. Adoption Complexity: How complex are the changes in the host stack to adopt the technology?  

  3. Backwards Compatibility: Does the technology use the existing block interface of CNS with added features or does it deviate to an entirely new interface that results in a lack of backwards compatibility to users that want to ignore data placement?  

  4. SSD WAF: What is the SSD WAF expected if the host stack adopts the technology?  

  5. Status: What is the current status of the technology? 

 

 

Figure 2. Various data placement technologies over the years sorted by level of host control. [1]  On the left extreme - Conventional SSDs (CNS) with no host control. On the right extreme - Open Channel with complete host control. In the middle - FDP with moderate host control.
Figure 2. Various data placement technologies over the years sorted by level of host control. [1]  On the left extreme - Conventional SSDs (CNS) with no host control. On the right extreme - Open Channel with complete host control. In the middle - FDP with moderate host control.
Figure 2. Various data placement technologies over the years sorted by level of host control.[1]
On the left extreme - Conventional SSDs (CNS) with no host control.
On the right extreme - Open Channel with complete host control. In the middle - FDP with moderate host control.

 

 

3. One of the first attempts at host-assisted data placement

Open Channel SSD (OCSSD) was one of the first approaches made to tackle the SSD write amplification challenge. The approach of Open Channel was to expose the NAND geometry, controller placement and scheduling policies to the host. This approach relied on the host taking total control over data placement as well as managing the underlying media by implementing wear levelling, error management, etc. OCSSD necessitated a host-based FTL to be implemented to achieve all this functionality (Figure 3). The host was able to achieve this level of control via the LightNVM [2, 18] subsystem in the Linux kernel, which provided the required interfaces to the host to do so. The LightNVM subsystem enabled various applications, storage stacks, and filesystems to have total control over how their logical data was physically mapped to the device. OCSSD was an open source approach to Fusion IO's proprietary storage stack that did host side data placement and management of NAND media [2]. OCSSD gained industry traction with various device vendors building controllers and some hyperscalers subsequently deploying it for some of their services. This traction led to the creation of Project Denali [3, 4] to standardize Open Channel SSDs and was mainly driven by Microsoft. Notably, Alibaba decided to go in another direction by defining and deploying their own version of OCSSD [5, 6] called "Dual-mode SSD" which could support both Open Channel and the conventional modes of operation.

While the Open Channel approach to solving write amplification showed great promise, TLC SSDs becoming commonplace increased the complexity of the host-managed FTL. With SSDs having differences in implementations across vendors owing to the complexity of programming multi-bit NAND cells, the host was left with the responsibility of dealing with complex implementations that guaranteed durability across these SSD implementations. This greatly diminished the attractiveness of the host-FTL approach to have total control over the underlying media. OCSSD was never standardized and was eventually dropped.

 

Summary and status of OCSSD:

  • Host Assistance: Complete host-based FTL.

  • Adoption Complexity: Adoption requires major changes to the host stack.

  • Backwards Compatibility: Not backwards compatible.

  • SSD WAF: SSD WAF of 1 is guaranteed.

  • Status: Not standardized and dropped by the industry.

 

 

4. A new data placement technology comes to life

Around the time the OCSSD approach was creating a buzz in the industry, NVMe standardized “NVMe Directives” [7], which many may know as multi-stream or just streams. The approach of streams is to allow the host to tag its write commands with a “hint”. This hint is interpreted by the SSD as distinct data types enabling it to place data with different hints into distinct erase blocks (Figure 3). This approach is much simpler than the complex host FTL approach of Open Channel. The key with multi-streams is for the host to identify distinct data types and use the hints appropriately to help reduce garbage collection in the SSD and ultimately reduce write amplification [8]. Multi-Stream SSDs were fully backward compatible to conventional SSDs. That is to say, if the host didn’t tag its writes with hints, it’d operate like a conventional SSD. A lot of academic and industry work brought out various papers and talks using Multi-Stream SSDs to reduce SSD WAF in various applications and ecosystems.

Streams was standardized and commercialized with Linux kernel support finding its way upstream as well. However, the lack of solid use-cases and the lack of industry adoption resulted in the Linux kernel removing the support for Streams. Despite its simple interface and the ease of integration to host stacks, the lack of industry traction meant that streams ultimately didn’t take off.

 

Summary and status of Streams:

  • Host Assistance: Uses write hints to segregate data in SSDs.

  • Adoption Complexity: Simple interface and adoption requires minimal changes to the host stack.

  • Backwards Compatibility: Backwards compatible.

  • SSD WAF: No guarantees on WAF.

  • Status: Standardized. But, dropped due to lack of traction.

 

 

5. Building on the lessons from Open Channel SSDs and SMR HDDs

Building on the experience from developing Open Channel SSD and the zoned storage model in ZAC/ZBC specifications [9, 10] already adopted in Shingled Magnetic Recording (SMR) HDDs, the Zoned Namespace (ZNS) interface was born [11, 17]. ZNS aimed to provide media-agnostic data placement control to the host without the complexity of managing the underlying NAND media. ZNS leaves traditional media management to the device FTL, learning from the OCSSD approach (Figure 3). ZNS was standardized by NVMe and led to the introduction of the ZNS command set in NVMe.

The basic building block of ZNS is a zone, which is a region that the host can write in a strictly sequential manner. The SSD is composed of many zones formed by partitioning the LBA space and are required to be managed by the host system. The host manages writing in sequential order to zones and takes up a lot of the FTL functionalities like maintaining an L2P map, triggering and implementing garbage collection, freeing up zones, and other stateful operations. ZNS leaves NAND media management to the SSD. So, while the SSD WAF with ZNS is 1 because of the strict sequential write requirement, the application-level write amplification (App. WAF) with ZNS increases. The end-to-end WAF (application + SSD) with ZNS depends on the host software stack implementation that manages zones and the various stateful operations.

The need for the host to do stateful operations resulted in very fragmented host software stacks as there was no one way to implement it. The strict requirement of the ZNS interface for the host to perform sequential writes also means that existing host stacks need to be redesigned to adopt ZNS. Given that zones are partitioned based on LBAs, host write patterns that are random in nature pose a challenge due to the strict sequential write requirement of zones. Inherently sequential write patterns, like log-structured writes for example, are perfect to use with ZNS because it is easy to adhere to the sequential zone write restriction. The management needed to convert non-sequential write patterns to sequential zone writes is often complex and not easy to achieve without significant application re-design.  Furthermore, ZNS is not backwards compatible with conventional SSDs as it deviates from the typical block interface. Ultimately the complexity presented by the strict sequential interface has posed a real hindrance to ZNS becoming “the” data placement technology for SSD-based systems.

 

Summary and status of ZNS

  • Host Assistance: Uses zones with strict sequential write requirement. Host manages data placement but leaves media management to the device.

  • Adoption Complexity: Adoption requires major changes to the host stack.

  • Backwards Compatibility: Not backwards compatible.

  • SSD WAF: SSD WAF of 1 is guaranteed.

  • Status: Standardized. Adopted in SMR HDDs. Low traction in NVMe.

 

Figure 3. Host + SSD Stack for CNS, Streams/FDP, ZNS and OCSSD illustrating the varying levels of host-control.
Figure 3. Host + SSD Stack for CNS, Streams/FDP, ZNS and OCSSD illustrating the varying levels of host-control.
Figure 3. Host + SSD Stack for CNS, Streams/FDP, ZNS and OCSSD illustrating the varying levels of host-control.

 

 

6. Recent past to present day - Lots of lessons and an attempt to consolidate the data placement ecosystem

There are lessons to be learned from the attempts of OCSSD, NVMe Streams and ZNS. Some of these lessons are:

 

  • Leave media management to the SSD: Managing NAND media in the host side is extremely complex and is highly dependent on the underlying NAND characteristics. The SSD is best positioned to manage media on its own by virtue of having a more detailed picture of the media at its disposal. This was a major learning from the OCSSD approach to data placement.

  • There is a reluctance to re-design or make major changes to host application stacks: Host software stacks, especially mature solutions, are extremely weary of undertaking huge re-designs to accommodate different data placement technologies. Given that data placement technologies are evolving constantly, the engineering effort to keep re-designing the application to adopt newer technologies simply doesn't make sense. This has been a key learning from ZNS.

  • Industry backing and solid use-cases are essential for success: As with Streams, no matter the ease of use, the lack of a solid use-case and the lack of customer drive will lead to the data placement technology petering out. 

 

Figure 4. Various data placement technologies over the years sorted by time. [1]
Figure 4. Various data placement technologies over the years sorted by time. [1]
Figure 4. Various data placement technologies over the years sorted by time. [1]

 

While ZNS had gained some traction, it was clear that the data placement ecosystem was far from settled and there was more to come. Google had developed a data placement method called Smart FTL and had been using it internally for its use cases. Google started talking more about Smart FTL and its technical details at public forums [12]. Around this time, Meta started talking about its data placement technology, Direct Placement Mode (DPM) in similar public forums [16]. Given that, there were some similarities between their approaches and both wanted to standardize their technologies, it made sense to collaborate. This willingness to collaborate and standardize their data placement methods in NVMe ultimately resulted in the introduction of NVMe Flexible Data Placement (FDP) [13, 14]. FDP incorporates key ideas from both approaches and has been standardized. FDP in its simplest form looks very similar to Streams. But, many subtle differences allow FDP to offer more to the host than simply tagging writes with a “hint” as streams did. We summarize in Table 1. the key differences between Streams, ZNS and FDP.

FDP aims to provide hosts the ability to segregate its data along with a mechanism to query the SSD to get more information on the state of the SSD. This is a feedback loop, using which the host can make tweaks or changes to its data placement logic to eventually achieve a WAF of ~1. This places FDP in between Streams and ZNS in terms of the level of host-device collaboration used for data placement (Figure 2). FDP has been a customer first technology, with a major push to adopt FDP coming from customers with specific use cases in mind. The FDP ecosystem is fast developing with lots of academic and industry work behind its fast progress. A key mantra to building and promoting FDP has been to try and achieve a majority of the data placement benefits with the least integration effort.

 

Our next blog post will be dedicated entirely to delving deeper into FDP and discussing its status today within the data placement ecosystem. Make sure to keep an eye out for that!

Table 1. Comparing Streams, FDP and ZNS [13, 15]
Table 1. Comparing Streams, FDP and ZNS [13, 15]
Table 1. Comparing Streams, FDP and ZNS [13, 15]

 

 

7. What is the future of data placement?

The data placement ecosystem has had many twists and turns, with many technologies ruling the data placement world at different points in time. It would be naive to assume that the data placement ecosystem has reached the end of its evolution or that any one technology will rule supreme. It is clear that ZNS is best suited to workloads that are inherently sequential in nature like log-structured writes. ZNS is not suited to random write patterns due to re-designs in application stacks that are often too complex. FDP is positioned to allow easy integration to application stacks with minimal changes to their stacks and provide the best bang for your buck in the engineering cost- SSD WAF benefit trade-off. If host systems using CNS want to have better control over their SSD WAF and are unsure of the best approach, FDP is the clear choice as the effort needed to experiment with it and integrate it is low. Unless the system already has a log structured write pattern, trying out ZNS is likely to be difficult due to the engineering costs required to change the host stack to adhere to the strict interface requirements. The best approach to placing host data will very much depend on the host stack design and the workloads involved. Given this subjective nature, one could expect other technologies to come up to solve different use cases. Therefore, the search for a single data placement technology that serves all use-cases might be a futile exercise. Whether multiple data placement technologies will co-exist, and what those technologies will be, only time will tell!

 

References

[1] Data Placement at Scale: Landscape, Trade-Offs, and Direction, https://lpc.events/event/18/contributions/1737/
 
[2] Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. LightNVM: The Linux Open-Channel SSD Subsystem. In 15th USENIX Conference on File and Storage Technologies (FAST 17), pages 359–374, Santa Clara, CA, 2017. USENIX Association
 
 
 
[5] In Pursuit of Optimal Storage Performance: Hardware/Software Co-Design with Dual-Mode. Yu Du, Ping Zhou, Shu Li
 
 
 
[8] Kang, Jeong-Uk, et al. "The multi-streamed {Solid-State} drive." 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 14). 2014.
 
[9] INCITS T10 Technical Committee. Information technology - Zoned Block Commands (ZBC). Draft Standard T10/BSR INCITS 536, American National Standards Institute, Inc., September 2014. Available from http://www.t10.org/.
 
[10] INCITS T13 Technical Committee. Information technology - Zoned Device ATA Command Set (ZAC). Draft Standard T13/BSR INCITS 537, American National Standards Institute, Inc., December 2015. Available from http://www.t13.org/.
 
[11] Bjørling, Matias, et al. "{ZNS}: Avoiding the block interface tax for flash-based {SSDs}." 2021 USENIX Annual Technical Conference (USENIX ATC 21). 2021.
 
[12] Chris Sabol and Smriti Desai, SmartFTL SSDs. https://www.youtube.com/watch?v=3O3zDrpt3uM
 
[13] Introduction to Flexible Data Placement: A New Era of Optimized Data Management: https://download.semiconductor.samsung.com/resources/white-paper/FDP_Whitepaper_102423_Final.pdf
 
 
[15] SDC 2023 - Host Workloads Achieving WAF==1 in an FDP SSD, https://www.youtube.com/watch?v=7O9QDCXGDqE
 
[16] OCP PANEL: Flexible Data Placement using NVM Express® - Implementation Perspective, https://www.youtube.com/watch?v=R0GHuKwi3Fc
 
[17] Zoned Namespaces (ZNS) SSDs. https://zonedstorage.io/docs/introduction
 
[18] OCSSD LightNVM http://lightnvm.io/