Skip to content

Samsung CXL Solutions – CMM-H

  • mail

As AI and Machine Learning solutions continue to be deployed across data center infrastructures, it is important to optimize the balance between compute, memory, and storage resources to best support language model processing performance along with cost management of such resources. In May of 2021, Samsung announced the development of the industry’s first Compute Express Link (CXL™) Memory Module-DRAM (CMM-D). CMM-D addresses the memory capacity server-bound limitations by supporting memory expansion and pooling. The CMM-D device is currently sampling to customers.

The next product in the CMM family portfolio is the Compute Express Link (CXL™) Memory Module- Hybrid (CMM-H), which was first introduced at FMS’22 as Memory-Semantic SSD. Hybrid means that there is a mix of media types in the CMM-H device, specifically DRAM and NAND Flash. The management of DRAM and NAND resources with the CMM-H controller supports two use cases in (1) Memory persistence and (2) memory tiering, both supporting host processor calls to the CMM-H device as one addressable memory space, intelligently integrated with the host DRAM memory. Use cases are currently being developed on the CMM-H platform including (1) memory persistence for in-memory data bases, (2) tiered memory for data analytics and AI inference models, and (3) memory optimization to improve memory utilization for better TCO across data center infrastructures.

What Is CMM-H? The CMM-H device features Samsung’s high-performance DRAM, coupled with NAND Flash, and a CXL Type 31 device. These technological characteristics are combined to offer a cost-effective memory expansion device. The motivation behind CMM-H is to combine NAND flash capacity with its built-in cache designing and the CXL load/store memory interface. It presents large NUMA nodes with existing Linux kernel framework and seamlessly integrates with applications without the need for modification. For applications that prioritize capacity, TCO, and throughput over random access latency, CMM-H is a great design choice. As a side benefit, it comes with built-in data persistence to minimize down time during data recovery. Examples include in-memory databases and AI inferencing of large language models.

CXL Type 1 Device (No Memory) Diagram
CXL Type 1 Device (No Memory) Diagram
CXL Type 2 Device Diagram
CXL Type 2 Device Diagram
CXL Type 3 Device Diagram
CXL Type 3 Device Diagram

CMM-H Features and Benefits
Traditionally, adding memory capacity and bandwidth in a system involves increasing the number of native CPU memory channels. But adding memory channels to a CPU increases engineering complexity and drives up cost. CMM-H as Type 3 device provides a flexible and powerful option to increase memory capacity and increase memory bandwidth, without increasing the number of primary CPU memory channels.

CMM-H Tiered Memory Feature
The tiered memory model offers an architectural solution to the complex problem of keeping pace with rapidly evolving processor and accelerator speeds. By strategically positioning frequently accessed data closer to the processing units, it not only effectively expands memory capacity but also enhances cost efficiency. In other words, placing memory where the data is stored will enable faster data processing, lower power requirements, and reduced TCO.

CMM-H can be used to expand the available memory in two ways. First, CMM-H can be used in the same tier as DRAM in the memory hierarchy. Alternatively, CMM-H can be used one tier below the main memory (DRAM) as a swap space.

The CMM-H tiered memory goal is to create a CXL based Memory Module solution that utilizes a combination of small amounts of DRAM and large capacity of NAND. Since CMM-H uses NAND memory on the backend, the persistent memory aspect provides large capacity, non-volatile memory at an affordable cost. Such CMM-H persistent memory solutions can be used to target Intel Optane as well as NVDIMM customers.

CMM-H Device Memory Cache Feature
A key element of CMM-H is its built-in DRAM cache designed to mitigate the long latency associated with NAND flash. A CMM-H device performs the device cache function in an application agnostic manner. It provides a facility by which some applications or workloads are aware and hints are given to the device to improve its overall performance. The Host Hints module provides an API to the Host software and applications to optionally send heatmap hints to the device to improve device cache performance. The CXL.mem protocol also provides an impressive 64-byte cache granularity that is truly revolutionary and a game changer for AI applications.

Heatmap Hints Diagram
Heatmap Hints Diagram

CMM-H Persistent Memory Feature
The CMM-H device supports a non-volatile memory type, in other words, a CXL based large capacity Persistent Memory (PMEM) solution. In the case of Persistent Memory (PMEM) mode, the CMM-H device supports Global Persistent Flush (GPF). When the device receives a GPF message, it immediately starts data flush operation to the backend SSD.

Samsung has developed the industry’s first DRAM-NAND hybrid CXL System-on-Chip (SoC) device aptly named, CMM-H PM (Persistent Memory). This latest innovation uses all of the CMM-H device features as its foundation and couples it with an SoC chip to process memory data even faster than its predecessors. An internal power source enables the DRAM-NAND SSD to provide persistent memory so that in the event of a power outage, data that resides in memory is continuously available to the applications that require data access. This device comes with an internal integrated energy source with the PCIe 5.0x8 E3.L form factor. It also supports CXL Global Persistent Flush (GPF), which disperses all non-persistent data to a persistent destination on the same CXL domain. Additionally, it is compliant with CXL 2.0 Type-3 (CXL.mem and CXL.io), where CXL.mem interface provides very low read latency at 32GB user data.

CMM-H PM
CMM-H PM
Figure 3: CMM-H PM (Persistent Memory)
 
 

CMM-H Memory Pooling and Switching
The CXL 2.0 specification used for CMM-H also supports single-level switching and memory pooling. Memory pooling increases the overall system efficiency by allowing dynamic allocation and deallocation of memory resources. Memory pooling also enables reduction of stranded memory, a common problem observed in server systems.

CXL 2.0 Memory Pooling
CXL 2.0 Memory Pooling

Conclusion
Samsung’s Memory Module solutions are forging the next frontier for Artificial Intelligence, Machine Learning, and Large Language Model processing. Taking more of the data processing and placing it in and around the memory modules is reshaping the way computing will be done in this new AI era.

 

1 CXL provides three different devices types. Type 1 is used in caching devices such as Accelerators and SmartNICs. Type 2 are GPUs and FPGAs that have memories like DDR and HBM attached to the device, and Type 3 are memory expansion devices that allow host processors to access CXL device memory cache coherently through cxl.mem transactions.