When we think back on what the “memory wall” consists of, we remember that when memory capacity is not sufficient, bandwidth is restricted which causes the latency of data transfers to increase, which causes the system to work harder and that increases power consumption and that increases the total cost of ownership, or, TCO.
Samsung has given this problem a solution they call “Near Memory Solutions.” This value proposition is setting the stage for a memory capacity revolution. AI applications are seeing more and more deep learning algorithms being driven by the large language model processing of Generative Pre-Trained Transformer (GPT). As data grows exponentially, the traditional role of memory is expanding from data storage to processing-in-memory so the capability of memory is now offloading some of the processing duties of the CPU and GPU. Simply put, it is now time for memory solutions to share the data processing workload with the CPU and GPU.
Samsung’s Memory Technology Day (MTD), held October 20, 2023 in Santa Clara, California, demonstrated a number of technological advancements in the area of on-device memory storage and accelerators that reside on the memory itself. These devices are aimed at keeping up with the ever-advancing progress of AI and the billions, soon to be trillions, of GPT parameters used to train the large language models (LLM).
JangSeok (JS) Choi, Corporate VP and the Head of New Business Planning at Samsung Electronics began by presenting the memory portfolio and explaining the vision for the new memory hierarchy. His position is that accelerated memory processing is the key to keeping up with the speed at which system processing of LLMs must occur. JS adds that not only is the industry trying to keep up with machine learning, but now has to solve for AI model inferencing. Simply explained, once the machine learning model has been trained, then the model inference automatically applies the learned knowledge to new data points to make predictions about the new data. This too takes extra memory capacity, execution, and precision.
The first of these portfolio propositions is the low-power, high-bandwidth, extended granularity (LHM) solution. LHM is a DRAM focused on low power with high bandwidth with the capability of 3-D stacking on top of the chip logic die. Also, in the portfolio is high-bandwidth memory (HBM) DRAM that is currently available to customers as “Icebolt” (HBM3). This memory device can be stacked and provides the highest bandwidth memory while using very low power. This is an AI accelerator that can stack 12 layers of 10nm-class 16Gb DRAM dies for 24GB of memory. This, states JS, is the AI inferencing solution that will be a difference maker.
Technologies like process-in-memory (PIM) and process-near-memory (PNM) were introduced. Solutions such as HBM-PIM and CXL-PNM have been developed as a proof-of-concept and this puts data transference and processing closer to memory so the DRAM is not bottlenecked while processing the large AI models.
Also, in the portfolio is the development of the Compute Express Link (CXL) protocols aimed at accelerating CPU performance. The CXL DRAM (CMM-D), CXL-PNM (CMM-DC), Memory Semantic SSD (CMM-H) and the Smart SSD + CXL I/F, Compute (CMM-HC) are all CXL memory expansions and compute solutions that are predicted from the memory lab. JS envisions that the demand for the value segment of CXL will surge by 2026.
During his presentation, JS emphasized several times that the overall success of conquering the AI era memory issues will come by partnering with others in the technology sector. Partnerships and collaborations with Meta, Memverge, and SAP HANA were presented during Mem Tech Day by representatives from those companies.
Walter Jun, Corporate VP, presented a detailed look at the technology that is being developed for the CMM product line and outlined why CMM is an important opportunity for Samsung. Cited as critical capabilities of CXL are the open standards interface, easy adoption using PCIe 5.0 infrastructure and the scale-up memory capacity and increased bandwidth that can be independently applied to process large data models.
Sungwook Ryu , VP and Head of the Memory Solutions Lab, was called to the stage to present the memory and SSD developments that are being pursued in the memory solutions lab. Two notable solutions that are being developed are; 1) passive memory devices to improve the overall system performance and 2) the transformation of passive memory devices into more active devices. The utilization of various protocols and interfaces including DDR, CXL, and NVMe have been employed in these solutions.
Joining Sungwook in this presentation was Yangseok Ki, VP of the Memory Solutions Lab. Presented by Yangseok was the CXL Memory Module – Hybrid (CMM-H) architecture, benefits and performance. An overview of the CMM-H module was explained and the importance of this technology development was highlighted. A more complete descripton of this device and architecture can be seen in Dr. Rekha Pitchumani’s webinar presentation
, “CMM-H (CXL Memory Module – Hybrid): Samsung’s CXL-based SSD for the Memory-centric Computing Era.”
Near Memory Solutions are forging the next frontier for Generative Pre-trained Transformer large language model processing. Taking more of the data processing and placing it in and around the memory modules is reshaping the way computing will be done in this new AI era.