Skip to content

[Memory Tech Day 2023] Building the Better SSD

  • mail
As the demand for both SSD memory capacity and operating speed continues to increase at a breakneck pace, so does the need to improve data storage efficiency, reduce garbage collection, and handle errors more proactively. For a big-picture analogy, let’s compare the problems faced with SSD data to the challenges of grain delivery from silo to transportation to warehouse. We’ll consider bags of grain to be delivered as bulk data to be stored on an SSD. NVMe SSD technologies allow the shipper (Data Center host) to specify:
    • A way for multiple grain shippers to tag their bags so that a single transport channel can carry all without mixing them up (SR-IOV, ZNS)
    • The best place in the warehouse to store each bag of grain with other like-grains stored (Flexible Data Placement – FDP) to minimize the number of bags to reorganize (Garbage Collection – GC)
    • The number of resources applied to high-priority shipments vs low-priority ones (Performance Control).
Now let’s consider the associated problem of pest control. In ages past, the world beat a path to the door of those who built a better mousetrap. In the SSD world, that task is akin to error management.
    • Improve the trap mechanism to maximize mice caught (CECC/UECC)
    • Monitor the trap to check the number of mice caught, whether the trap is full, and whether one trap is not working as well as others (SMART/Health)
    • Track and report the most mouse-related activity possible (Telemetry)
    • Use the activity data to foresee a major pest infestation before it happens (Failure Prediction)
And then there are cross-functional issues, such as…
    • Recovering grain bags to a new storage area when the original area has been overrun (data recovery and new drive migration)
Samsung is building a better mousetrap by leading the technology world in SSD engineering. The Samsung annual Memory Tech Day event offered several breakout sessions that uncovered our latest storage technologies. Here are the key takeaways from the computing memory solutions track.   Jung Seungjin, VP of Solution Product Engineering team, discusses SSD Telemetry. Consider a brief history of telemetry: Telemetry concepts of collecting operations data and then transmitting it to a remote location for interpretation have been around for well over a century. Various forms of error logging and retrieval were included from the beginning of modern hard drive technologies. Basic SSD-specific telemetry commands and delivery formats became standard starting with NVMe 1.3. In more recent times, Samsung has been using its position as the leader in SSD technology to drive sophisticated and necessary telemetry additions to the spec. The benefits of Samsung’s cutting-edge research become immediately obvious. Consider, for example, Samsung Telemetry Service, an advanced tool helping enterprise customers remotely analyze and manage their devices. It guarantees the stability of data – allowing data center operators to prevent future drive failures, manage drive replacement, and migrate data. “Through monitoring, we realized that multi-address CECC can become a UECC that can cause problems in the system in the future.” The Telemetry presentation focuses on telemetry background, the latest improvements that Samsung is driving to add to the specification, and examples of the value they add to enable detection of drive failure. Of key interest is Samsung’s advanced machine learning-based anomaly prediction research.
Silwan Chang, VP of Software Development team, talks about Flexible Data Placement (FDP) and the ease of its implementation to dramatically reduce Write Amplification Factor (WAF). The discussion includes a comparative analysis of various Data Placement technologies including ZNS, showcasing a use case for Samsung's FDP technology. The underlying limitation of NAND is that data in a NAND cell cannot be overwritten – thus, a NAND block must be erased before writing the data. Data placement technology overcomes this limitation because ideal data placement can increase the performance and endurance of modern SSDs without additional H/W cost. The host influences data placement through the Reclaim Unit (RU) handled by the SSD; knowing the most efficient size and boundaries of this basic SSD storage unit, the host can group data of similar life cycles to reduce or eliminate SSD garbage collection inefficiencies. “The best thing about an FDP SSD is that this is possible with a very small change of the system SW.”
Following up, Ross Stenfort of Meta presents Hyperscale FDP Perspectives where he shows the progression of improvements to reduce WAF:
    • Overprovisioning – allocating extra blocks to use for garbage collection
    • Trim/Deallocate host commands – telling the SSD what can safely be deleted
    • FDP – telling the SSD how to group data in order to minimize future garbage collection.
The presentation includes a compelling workload example without and with FDP, noting that: “Applications are not required to understand FDP to get benefits.”   In his next session, Silwan Chang continues with a discussion about the present and future of Samsung SSD virtualization technology using SR-IOV. Efficiency has become a central focus for increasing datacenter processing capacity. With the number of datacenter CPU cores typically exceeding 100, the number of tenants (separate instances / applications) utilizing a single SSD has surged. Virtualization provides each tenant its own private window into SSD storage space. The PCIe SR-IOV specification provided the basics for setting up a virtualized environment. With its research giving it an early lead, Samsung now has nearly a decade of experience with SR-IOV – and has identified and developed solutions for underlying security and performance issues:
    • Data Isolation – keeping data from one tenant secure from access by others, evolving from logical sharing to physically isolated partitioning
    • Performance Isolation – preventing activity by one tenant from adversely affecting performance of other tenants
    • Security Enhancement – encryption evolving from Virtual Function level to link level
    • Live Migration – moving data from one SSD to another while keeping both in active service to the datacenter host.
“To realize completely isolated storage spaces in a single SSD, we need to evolve into physical partitioning where NAND chips and even controller resources are dedicated to a namespace.”
Sunghoon Chun, VP of Solution Development team, talks about Samsung's ongoing development of new solutions tailored to meet the challenges of rapidly evolving PCIe interface speeds and the trend towards high-capacity products. The key focus is higher speeds at lower active power, aspects that tend to be mutually exclusive. Samsung targets lower active-power in two main ways:
    • Designing lower power components by adding power rails to boost the efficiency of the voltage regulator
    • Introducing power-saving features to optimize the interaction between components, such as by modifying firmware to favor lower-power SRAM utilization over DRAM.
The higher speed target brings with it higher temperatures, which Samsung addresses with:
    • Form factor conversion to accommodate higher thermal dissipation for power demands going from 25W to 40W
    • Use of more effective and novel case construction materials and design techniques
    • Thermal management solutions using immersion cooling that yield strong experimental gains.
“The goal is to continue efforts to create a perfect SSD, optimized for use in immersion cooling systems over the next few years in line with the trend of the times.”
In summary, this presentation track reveals the Samsung SSD strategy for customer success.
    • Dramatically reduce WAF by taking advantage of Samsung’s advanced Flexible Data Placement technology
    • Vastly increase virtualization efficiency using Samsung’s performance regulation and space partitioning technology to maximize the processing capacity for each core of the multi-core datacenter CPU
    • Achieve significantly higher operating speeds while both reducing power and increasing heat dissipation by using Samsung’s novel design and packaging techniques
    • Remotely analyze and manage devices to virtually eliminate data loss and its crippling downtime through the innovative Samsung Telemetry Service.
Customers following the Samsung advanced research roadmap are guaranteed a “no-limits” path to increased performance and decreased cost over the next decade.