Skip to content

Significant Infrastructure Cost Savings for AI Fraud Detection Using High-Density Memory

Written by Vasanthi Jagatha, Sr. Manager, Data Fabric Solutions
Mayank Saxena, Sr. Director, Data Fabric Solutions

  • mail

AI fraud detection constitutes a critical workload for fintech companies. Fraud monitoring involves mechanisms like natural language processing to screen communications for suspect language, machine learning to distinguish fraudulent transactions from valid ones, and analytics to distinguish normal user behavior from deviant behavior as well as to predict future trends based on historical data.

When it comes down to key compute activities, the ongoing real-time inference work revolves around processing transactions, generating fraud scores, and taking actions such as blocking an offending transaction. This workload involves a large number of small KB-sized transactions that need to be stored in the right format and serviced immediately.

The needs of this workload are met with In-Memory Databases (IMDBs), which provide storage in the appropriate format for this type of work while allowing blazing fast load/store I/O performance compared to typical block access storage.

The memory capacity required for this type of database is significant, which is a big challenge to implement since actual memory per server is limited. For a large workload requiring clusters of servers, several total cost of ownership (TCO) considerations must be taken into account.

  • Additional servers. More servers must be brought online to service the workload, solely because more IMDB memory is required. That is, customers end up paying for additional compute, storage, security, and system software that are not really needed, just for the extra memory.
  • Greater network complexity. Networking complexity is increased not just in the physical infrastructure within the data center, but also in the IMDB software complexity — increasing power demands and hurting both performance and the pocketbook.
  • Reliability nightmares. Given the sensitivity and privacy of their customers' data, fintech companies tend to maintain their own data centers. Accordingly, the reliability of the underlying infrastructure becomes very important to data integrity. A bloated infrastructure of added nodes and complex networks requires more system replication to mitigate the impact of failures.


Cost of Adding DRAM

One seemingly simple solution to the problem, increasing the memory capacity per server node, is not all that easy to implement. The typical memory pyramid in fintech infrastructure terms looks like this:
 

A glowing orange memory hierarchy pyramid showing layers labeled XPU, Cache, HBM, DRAM, Direct Attached SSD, and Remote Storage against a dark background.
A glowing orange memory hierarchy pyramid showing layers labeled XPU, Cache, HBM, DRAM, Direct Attached SSD, and Remote Storage against a dark background.


Memory needs are predominantly serviced by DRAM, which is typically 10X more expensive than SSDs in terms of $/GB. Increasing node memory capacity through DRAM is costly.

Aside from cost considerations, it is not technically feasible to scale DRAM capacity to the levels possible with SSDs: Slots are limited, as is the capacity per memory card. Even the most expensive enterprise systems cannot exceed a total DRAM capacity of 20TB currently, while SSD capacity can reach PB scale. 
 

Memory Expansion with CXL

To illustrate a more efficient approach, the diagram below adds two more memory tiers using CXL devices. Supporting memory semantics (coherency, small load/store I/O) that are more specific to AI workload needs, CXL gives customers the ability to create highly dense memory systems to offer more capacity without the need for additional servers, which improves overall TCO. CXL offers blazing fast performance compared to SSDs and much higher capacity compared to DRAM.

CXL protocol for direct attached or remote storage is predicted to be a standard feature for high-end servers targeting AI, HPC, and cloud data center workloads. It will become a significant player in memory expansion, memory sharing, and memory pooling use cases. In addition to scalability, it offers bandwidth and latency performance that meets AI fraud detection workload requirements.  
 

A blue memory hierarchy pyramid labeled XPU, Cache, HBM, DRAM, Direct Attached Memory, Memory Over Fabrics, Direct Attached SSD, and Remote Storage on a dark background.
A blue memory hierarchy pyramid labeled XPU, Cache, HBM, DRAM, Direct Attached Memory, Memory Over Fabrics, Direct Attached SSD, and Remote Storage on a dark background.


RAS, SLAs, Observability 

In addition to performance and TCO savings, fintech providers also have a critical need for reliability, serviceability, availability (RAS) and observability for their entire system memory across their server clusters in their data center. They have to constantly monitor for device failure, which happens more often than they would like. Quick and timely problem detection and intervention on these devices allows the fintech provider to honor critical Service Level Agreements (SLAs).

Accordingly, the RAS of that data and the underlying infrastructure become very important. If the underlying infrastructure becomes unwieldy, fintech support organizations have to deal with constant failures, requiring them to manage multiple replications to meet the SLAs they promised to their own customers.
 

Samsung Cognos as a Solution

Implementation of the CXL memory solution can be bootstrapped with Samsung's AI-enhanced memory management and orchestrator software, known as Samsung Cognos.

Cognos provides critical support to enable direct attached memory through the following features:

  • Management of high-density, multi-device memory pools with easy scalability to address memory stranding problems
  • Application-aware memory orchestration for maximum tiered memory performance
  • Automatic tiering of data based on the fraud detection SLA metrics, with localization and hot data pattern management as well as device level hooks in Samsung devices
  • Intuitive console for easy device- and application-level observability
  • Transparency to applications, such that applications need not be modified to use Cognos

Cognos makes it easy to monitor and maintain server clusters through RAS and observability features. It provides a hands-off scalable approach to customers for memory management, and a frictionless integration with the IMDB needed for the fraud detection application.
 

A system architecture diagram showing Samsung Cognos for AI fraud detection, with CXL memory orchestrator, IMDB system memory, DRAM, CMM-D modules, auto-tiering, memory pooling, dynamic memory scaling, and CXL servers connected to an expansion chassis for DRAM and CMM-D resources.
A system architecture diagram showing Samsung Cognos for AI fraud detection, with CXL memory orchestrator, IMDB system memory, DRAM, CMM-D modules, auto-tiering, memory pooling, dynamic memory scaling, and CXL servers connected to an expansion chassis for DRAM and CMM-D resources.


By employing Cognos and CXL in an actual operating environment, application users were able to realize a 4X TCO improvement while meeting their latency and throughput SLA target.
 

A performance comparison chart showing local DRAM versus DRAM plus CXL with Cognos Auto Tier. The graph highlights a 4× increase in data size per server (150 GB to 600 GB), throughput above 100K ops/s for both configurations, and average read and update latencies well under the 1 ms target.
A performance comparison chart showing local DRAM versus DRAM plus CXL with Cognos Auto Tier. The graph highlights a 4× increase in data size per server (150 GB to 600 GB), throughput above 100K ops/s for both configurations, and average read and update latencies well under the 1 ms target.


Because there are no application-level changes required, this solution has a compelling value proposition for many IMDB workloads in addition to the AI fraud detection workload discussed here.
 

A performance comparison graphic showing multiple applications—Cassandra, Redis, Greenplum, KVM, HammerDB, and Graph500—running without code changes on different memory configurations. The Redis in-memory database chart compares normalized runtime across uniform random, Gaussian, and sequential access patterns for DRAM, interleaved DRAM plus CMM-H, and CMM-H. Another chart shows Cassandra YCSB workload performance per dollar, with DRAM as baseline and significantly higher performance for interleaved DRAM plus CMM-H and CMM-H alone.
A performance comparison graphic showing multiple applications—Cassandra, Redis, Greenplum, KVM, HammerDB, and Graph500—running without code changes on different memory configurations. The Redis in-memory database chart compares normalized runtime across uniform random, Gaussian, and sequential access patterns for DRAM, interleaved DRAM plus CMM-H, and CMM-H. Another chart shows Cassandra YCSB workload performance per dollar, with DRAM as baseline and significantly higher performance for interleaved DRAM plus CMM-H and CMM-H alone.


For those interested in full stack solutions and wishing to collaborate with Samsung to provide more value to their customers, please reach out to us at rdmsldfscore@ssi.samsung.com or visit our webpage to learn more: https://semiconductor.samsung.com/about-us/locations/us-rnd-labs/memory-labs/data-fabric-solutions/.