Skip to content

[Inside LBS Ⅰ] Dividing and Conquering Linux High-Capacity SSD Support with LBS

Written by Luis Chamberlain, R&D-DTC/DSRA(DS)

  • mail

The incredible demand for high capacity (HC) SSDs has pushed SSD manufactures to increase their internal indirection unit (IU) so to make HCs possible today. However, the requirement of a larger IU also meant evaluating if larger IU SSDs could be supported in a seamless way on today's software ecosystems. Our team at Samsung has taken this challenge on and have implemented support for HC SSDs by only making operating system changes. We've provided this enablement through the open ecosystem of Linux, but the experience and lessons learned along the way should be useful for other operating systems and software ecosystems. In this series of posts, we will review what the IU is, why it has been needed to increase the IU size to support HC SSDs, and review how support for larger IUs was simply an operating system filesystems and memory management problem.

We recently contributed an article to LWN to describe how as of the v6.12 release, Linux now supports large block sizes (LBS). The filesystem block size is no longer limited to the CPU page size, dictated by the memory management unit (MMU). The first filesystem to support this is XFS, and bcachefs follows on the upcoming v6.15 release. By pure coincidence the v6.12 release also happens include support for another long-term Linux kernel community effort, the 20-year community real time preemption effort. The LBS effort was also a long-term Linux kernel community effort, adding support on v6.12 was the culmination to a 17-year community effort. To many, an unexpected gain from the LBS effort, was the ability for the community to leverage LBS to swiftly add support for large atomic writes to XFS through the v6.13 kernel release. LWN describes this effort, how LBS helped make XFS the first filesystem to support this and the ongoing R&D challenges of supporting large atomics without LBS. While the value of LBS and large atomics for HC SSDs may seem obvious now, just two years ago in June 2023 even the prospects of LBS seemed very unclear. At the 2023's LSFMM large block sizes session I presented the lofty objectives which we needed to aim for through objective key results (OKRs) in a spreadsheet to get LBS support. The OKRs were useful to make it clear we had a steep road ahead of us. Why did we do all this work? How did we get here so quickly?

Image 01
Image 01
Image generated using Gimp for illustrative purposes.

 


Table of contents
  • Part 1. What is a high-capacity SSD indirection unit?
  • Part 2. Addressing the large IU storage stack challenge
  • Part 3. Empirical evaluation of large atomics
  • Part 4. Why large IU support was a memory management problem
  • Part 5. How did we get here so quickly?



Part 1. What is a high-capacity SSD indirection unit?

This is part 1 of our series on addressing support for HC SSDs in the software ecosystem.

 

Why did we do this?

Given the large amount of work we knew we would have to do to get LBS upstream, why did we do it? To help understand all this, and given not all readers are expected to know low level details about SSDs and HC SSDs, we'll explain what an IU is, and show the relationship between a large IU, large atomics and LBS.

 

What is a high-capacity SSD indirection unit?

The concept of using indirection for SSDs to map logical block addresses (LBAs) to physical block addresses on the FTL is well documented on the first section of the old paper, "Removing the costs of indirection in flash-based SSDs with nameless writes". An IU or mapping refers to the FTL logical to physical (L2P) mapping unit size to map logical page number (LPN) to physical page number (PPN). The mapping allows host software to query one LBA consistently while allowing the storage controller to use whatever NAND location it wants within the drive. The IU is opaque to host software.

Image 02
Image 02
Mapping LBS in SSDs via indirection

 

Example with different IUs

The IU mapping is kept in a mapping table, in enterprise SSDs the entire mapping table is kept on the storage controller DRAM. Due to the demand of HC SSDs the DRAM on SSDs needs to be increased if we are increasing the size of the mapping table. For example, a 256 GiB drive may present 4 KiB LBAs to the host. In standard enterprise SSDs, the drive may use a 4 KiB IU, and it will allocate 4 bytes or 32 bits to each IU map entry. If we divide the total capacity of the drive by 4 KiB, this gives us total number of entries in our mapping table, with each entry being of size 4B:

 

(Capacity in bytes) / (IU size in bytes)
256 * 1024 * 1024 * 1024 / 4096
67108864
67108864 number of entries

(Number of entries * Mapping entry size)
67108864 entries * 4 bytes
268435456 bytes

268435456/1024/1024
256 MiB

 

That mean that a 256 GiB drive would require 256 MiB of DRAM for the entire mapping table. We end up with a DRAM to storage capacity ratio of 1:1024 (commonly approximated as 1:1000 in discussions). If the IU is modified to 16 KiB instead, we'd be able to fit 4 of the 4 KiB sized LBAs per each mapping entry, reducing the amount of memory on a 256 GiB SSD from 256 MiB to 64 MiB. The savings of DRAM means lower cost SSDs. Note that even with a 16 KiB IU the mapping entry size remains 32 bits, what changes it how we divide the capacity drive:

 

(Capacity in bytes) (IU in bytes)
256 * 1024 * 1024 * 1024 / 16384
16777216
16777216 number of entries

(Number of entries * Mapping entry size)
16777216 entries * 4 bytes
67108864 bytes

67108864 /1024 /1024
64 MiB

 

We'd be reducing the mapping table size by 4 times with this change. Now let's consider some examples different capacities.

The amount of DRAM required for smaller consumer SSD sizes may not seem like much, but what if we start evaluating relevant HC SSD Sizes? Let's take a look at what the DRAM implications would be.

Limitations of different IUs

So far, we have focused on an example of a 16k IU size. We can further infer the max SSD capacity supported by an IU if we want to keep each mapping entry within 32 bits. 32 bits is a relevant value to focus on because most embedded DRAM interfaces, embedded processors, and other components in an SSD use as a building block. To estimate the maximum SSD capacity supported by an IU, we simply evaluate 2^32 * 4 KiB which is 16 TiB. With a 16 KiB IU that's 2^32 * 16 KiB which gives us 64 TiB. And so on.

Supporting larger IUs

Supporting larger capacities than 16 TiB with a 4 KiB IU is possible but requires less favorable solutions. For example, increased DRAM on the storage controller might be a solution, but overflowing the 32-bits IU size is an extra multiplier on the DRAM capacity increases. Due to the direct relationship with increased DRAM capacity and cost of using a 4 KiB IU, supporting HC SSDs over 16 TiB becomes cost prohibitive. Making matters worse the above table illustrates that for every increase in SSD capacity, there is a risk of increased IU size.