In general, customers’ AI models trained in the cloud or on servers are very large in size and optimized for GPU operation. To run these types of AI models on the Exynos NPU, it is essential to convert them into on-device AI models through processes such as graph optimization, quantization, and compilation.
The On-device AI SDK Toolchain converts the customer’s original AI model into an on-device AI model capable of running in the on-device NPU environment through a lowering process. Ultimately, the AI SDK Toolchain is indispensable for supporting customers’ AI models. However, several technical challenges must be overcome to achieve this:
[1] Supports various AI model IRs
As the number and complexity of supported AI models rapidly grows each year, the on-device AI SDK Toolchain must support a variety of response scenarios. By supporting a wide range of AI model IRs¹⁾ such as PyTorch²⁾, ONNX³⁾, TensorFlow⁴⁾, and TFLite⁵⁾, our SDK empowers developers to iterate faster and flexibly adapt. This is what makes for truly agile AI development.
[2] Verification Methods for Each Toolchain Stage
During the lowering process of an AI model, the original model is progressively transformed into a hardware-executable model through graph optimization and quantization. It is critical to strengthen verification at each stage to ensure that the accuracy and performance of the original AI model are preserved as much as possible.
[3] Advancement of Graph Optimization and Quantization Algorithms
To maximize the performance of on-device AI models, it is also necessary to continuously enhance graph optimization techniques and quantization algorithms tailored for highly complex models like LLMs.
To this end, Exynos AI Studio — Samsung’s on-device AI SDK — addresses these key technical challenges and offers robust solutions to customers.
The Advancement Strategy for Exynos AI Studio, Exynos’ On-Device SDK
Samsung has developed and distributed the Exynos AI Studio SDK to customers to become a global leader in the field of on-device AI, and is preparing for the future with a variety of advancement strategies.
Exynos AI Studio is largely composed of the Exynos AI Studio High Level Toolchain (EHT) and the Exynos AI Studio Low Level Toolchain (ELT). Respectively, these perform advanced graph optimization and quantization at the model level, as well as SoC-specialized algorithms and compilation.
EHT takes open-source framework IRs such as ONNX and TFlite as inputs, converts them into an internal IR through the IR Converter, and then modifies the model structure via Graph Optimization to make it suitable for execution on the NPU. Through quantization, it reduces the model size to a level that can run efficiently on-device.
ELT carries out lowering operations optimized for each NPU generation, converting the model into a form that’s executable on hardware. Finally, the model passes through the Compiler, generating an on-device AI model that can run on the NPU.
Designing SDK Features To Handle Various AI Model IRs
To enhance the scalability of the SDK, it is essential to support multiple AI model IR formats. Samsung’s SDK currently supports open-source framework IRs such as ONNX and TFLite, and it is developing a strategy to strengthen PyTorch support. In particular, for generative AI models, performing graph optimization and quantization within the PyTorch development environment can minimize unnecessary conversions during model lowering, which enables the delivery of a more stable and efficient SDK.
When various AI model input IRs pass through the IR Converter within the SDK, they are transformed into an internal IR optimized for Exynos on-device AI development. Since all SDK modules use this internal IR as an interface to exchange information, the software architecture is designed to be both highly extensible and highly flexible.
Step-by-Step Verification With a Simulator and Emulator
As the lowering process progresses through the SDK Toolchain, the model size decreases, and consequently, the accuracy of the original model also diminishes. At this point, to strengthen functional verification for each SDK module and minimize the loss in accuracy performance, verification capabilities at each stage of the toolchain are essential.
The output of the EHT module in Exynos AI Studio can be compared with the original model on an operator basis by using the Signal-to-Noise Ratio (SNR) metric through the simulation function. In the simulator, to process quantization information, specific operators are handled with de-quantize and quantize operations before and after inference, enabling computation through fake quantization. The results of the ELT module are validated for accuracy using the emulation function, in a manner similar to EHT verification. Since the emulator performs computations through emulation code that replicates the NPU hardware, it enables more precise validation.
Strategies for Advanced Graph Optimization and Quantization Algorithms
As AI models become more complex and larger in size, advancing the graph optimization and quantization algorithms supported by the SDK becomes even more essential.
In the graph optimization stage, processes can be classified into hardware-agnostic and hardware-specific forms. After applying optimizations that are suitable for general computing devices, specific algorithms tailored to the characteristics of the NPU hardware accelerator are executed. The quantization algorithm reduces an AI model trained on servers with fp32 bit width to a size that can run on NPU devices, such as int8, int16, or fp16 bit width. Through advanced graph optimization and quantization algorithms, it becomes possible to perform NPU optimization while preserving the original model’s accuracy performance as much as possible.
Driving the Future of On-Device Intelligence
On-device AI has moved beyond its technical limitations and is now becoming a practical reality. With its Exynos AI Studio SDK, Samsung is delivering the speed, accuracy, and scalability that tomorrow’s AI demands. This ensures intelligence truly lives where people need it most: in their hands.
On a technical level, Samsung’s Exynos AI Studio SDK adopts the structure of an on-device SDK toolchain, performing optimization, quantization, and compilation so that customers’ AI models run effectively on NPU hardware. Going forward, through the execution of comprehensive design and development strategies, the company will continue to hold up its reputation as a global leader in on-device AI technology.
* All images shown are provided for illustrative purposes only and may not be an exact representation of the products
1) An intermediate representation (IR) is a hardware-agnostic format that unifies models from different deep learning frameworks, enabling post-processing such as optimization, quantization, and compilation.
2) An open-source deep learning framework developed by Meta, optimized for flexible and intuitive AI model development.
3) Open Neural Network Exchange (ONNX) is an open-source format for representing machine learning and deep learning models.
4) An open-source deep learning framework by Google, designed for large-scale AI training and deployment.
5) A lightweight version of TensorFlow optimized for running AI models on mobile and edge devices.