On-Device Compute Technologies
Benny Katibian, Senior Vice President and Head of the Samsung Austin Research Center and Advanced Compute Lab, provided an opening session titled “On-device Compute Technologies.” In the session, Katibian covered how Samsung System LSI has developed three backbone IPs to bring on-device compute capabilities into the mobile environment. CPU with Optimized Four-Cluster Structure Of the three backbone IPs, the first one is CPU, which traditionally consisted of three clusters; the high-end cluster for time-sensitive applications, the low-end cluster for backdrop applications, and the middle cluster. For more efficient CPU operation, Samsung decided to divide the middle cluster into two different gears: mid-high and mid-low, with the mid-high gear used for compute intensive applications such as gaming. To further optimize the power and performance in heavy gaming scenarios, the CPU power portion is reduced and more power is reallocated from the CPU cores onto the GPU in order to increase its compute capability to support top-tier gaming graphics in mobile devices. Bringing Console-Level Gaming into the Mobile Platform To reach the goal of bringing console-level gaming into the mobile platform supported on premium to low-end segments, Samsung started developing the Xclipse GPU with the basis of AMD RDNA™ architecture. With this development, Samsung became the first to introduce a ray tracing feature into the mobile environment. Katibian showed a demo video of the mobile ray tracing feature based on an Exynos reference platform, which demonstrates the full power of ray tracing features such as shadow, reflection, and global illumination, all enabled simultaneously. Advanced NPU for the Era of Generative AI NPU, on the other hand, will be applied to generative AI solutions. Samsung’s newest NPU solution will achieve this by changes in its architecture to remove memory bottlenecks and by drastically increasing the support and acceleration for non-linear operations, often used in Transformer-based models. As a result of these architectural changes, the MobileBERT benchmark performance increased by three times compared to the previous generation.Scalable Central Compute with Samsung Auto SoC
Vehicles are currently going through a fundamental shift — similar to how the telephone evolved into the smartphone. Indeed, modern vehicles have become far more than just means of transportation; they are a broad collection of computing functions such as generative AI. With this reality as a backdrop, Jihoon Bang, Corporate Vice President of AP Software Development, gave an informative session about Samsung Auto SoC solutions and their support for scalable central compute applications. Shifting Automotive Architecture to Central Compute Bang elaborated on how vehicles have progressed from distributed architecture to domain-centralized systems and now to consolidated central compute. This shift brings greater levels of efficiency and simplicity, but the architecture’s highly connected nature also creates new technical considerations for areas like safety. To fully address this development, the next generation Samsung Auto SoC is geared toward central compute and aims to provide major advances in safety, security, extensibility, and scalability. Enhancing Safety and Security Demonstrating Samsung’s unwavering commitment to safety, their next-gen Auto SoC boasts an ASIL-D-compliant Safety Island that works separately from the host CPU to monitor the status of other SoCs. Furthermore, the rest of the SoC has been made ASIL-B compliant by following automotive standards such as ASPICE, ISO 26262, and FMEA. Since cyber security is also an increasingly important factor in modern times, the Auto SoC now features a Primary Security Processor with a built-in crypto engine and StrongBOX hardware blocks. Additionally, Samsung has developed ExynosTEE, a secure in-house OS that protects user information and is already EAL2-certified. All further developments of security software are expected to follow the ISO 21434 standard. Ramping Up Extensibility and Scalability In terms of extensibility, Samsung has developed a proprietary Type-1 Hypervisor to meet the growing software requirements of each automotive domain. This hypervisor can virtualize various OSs without significant performance degradation and supports the industry-standard API, VirtIO. In addition, the Auto SoC is not limited to specific software, offering support for third-party hypervisors. As the demand for central compute grows exponentially, scaling out hardware and software will only become more relevant. Therefore, one of Samsung’s key solutions will be to use a die-to-die connection between two SoCs to double computing capacity without modifying software. The Auto SoC also supports using PCIe or ethernet for multi-SoC connectivity between packages, depending on the OEM’s system. Poised for a Future of AI-Integrated, Software-Defined Vehicles In the near future drivers will interact seamlessly with AI assistants based on large-language models (LLMs), which will rely on Samsung’s dedicated AI accelerator that can operate LLMs of up to 15 billion parameters in real time. And when automotive technologies inevitably advance further, the Auto SoC offers meaningful future-proofing that can concurrently run the multiple heterogeneous OSs of each domain. Bang summed up these bright prospects by saying, “With the help of the central computing capability of Samsung Auto SoC, what was once a figment of one’s imagination, will soon become a reality.”Multimedia Applications in the Real World
Aided by more powerful SoCs and increased sensor functionality, modern multimedia applications are becoming smarter and more computationally demanding. To meet this need and to reach the ultimate goal of fully emulating human behavior, Samsung has identified low latency and low power as two key areas for advancement. In an engaging session titled “Multimedia Applications in the Real World,” Kijoon Hong, Vice President and Head of Multimedia Development, expanded on Samsung’s approach to taking these two technological steps forward. Context-Aware Computing As Hong explained, many of our feature-packed modern devices provide functionality enabled by context-aware computing. Context awareness has humble origins in infrared sensing, but modern sensors have opened up a new world of sensory data to be leveraged. The concept can be further broken down into three steps: gathering data from image sensors, processing the raw data, and using the processed data to better serve users. Processing all the raw data requires heavy computation, which then causes more power consumption. This is where Samsung’s engineering innovation comes into play. Samsung uses a distributed architecture containing a dedicated domain-specific system, which is inherently more efficient since it performs a lighter, more specialized computational load. This system provides low power consumption as well as low-latency processing. Unlocking New Potential with Low Power and Low Latency These enhancements help the system efficiently generate contextual information from the raw data, which applications then use to adapt their behavior. Adaptable technology is gaining traction and, for example, enabling mobile cameras to catch up with conventional digital cameras. This is because Samsung’s dedicated system has context-aware processing that can provide local motion estimation and instance segmentation, resulting in higher image and video quality. What’s more, these improvements come with a hardware area up to five times smaller than a general processor. And since Samsung’s optimized systems now offer power consumption as low as 30uW for an always-on camera, video can be streamed for an entire week without needing a battery recharge. Low power has always been essential for mobile devices, but the surge in demand for mixed reality solutions has brought a new focus on low-latency processing and accurate synchronization of augmented audio/video data. Current systems can achieve a latency of around 33 milliseconds (ms), but Samsung has recently modified the architecture to have a sequential hardware pipeline and hardware/software-optimized system, allowing for motion-to-photon latency of much less than 10ms. In addition to these latency improvements, other advances have been made in gesture recognition, eye-tracking and spatial audio. Tomorrow’s Multimedia Technology The next phase of human sensing systems will recreate perception at levels that feel natural and accurate. For vision and hearing, this entails always-on cameras and microphones that observe, detect, and classify users’ surroundings. Through this active context awareness, applications will adapt their behavior accordingly, just like humans do when they react to audio/visual stimuli. For example, audio systems will be able to use 3D shape estimation — performed by the 3D fusion of data from various sensors — to reconstruct a real-world acoustic field and then provide intelligent, immersive audio experiences. Together these multimedia advancements are bringing us closer to a future where humanoids sense, perceive, and act as effortlessly and innately as humans do. As Hong put it, “To make this future a reality, Samsung System LSI’s solutions will still need to close the gap. But we're confident that our systems will bring on a much more immersive and interactive future.”