Skip to content

[All About Exynos] ② An Upgraded Mobile Experience: The Important Role of CPU and NPU in Smartphones

  • mail
We all have experienced being in awe at what the latest smartphones, tablets, and PCs released each year can do. There have been numerous technological advancements since the very first smartphones were developed, making modern smartphones utterly different from their predecessors. These days, the smart devices that we hold in our hands can perform just as many tasks as most PCs. The key technology that determines the performance of smart devices is the mobile processor. This mobile processor is the system semiconductor in charge of computations and operations of multimedia in the latest mobile devices. In most cases, it comes in the form of a system-on-chip (SoC). Various semiconductor technologies are utilized in the SoC, and system blocks, such as a central processing unit (CPU), graphics processing unit (GPU), and modem, are incorporated into a single chip. Simply put, SoC is a chip that combines all the key parts that make smartphones and tablets operate. Samsung Newsroom has met with the development leaders behind the seven IPs of the Exynos mobile processor, which is considered the “brain of smartphones.” Throughout the three stories in this series, Samsung Newsroom will introduce the roles and characteristics of each type of IP, which gives a smartphone its competitiveness, and the development direction going forward. The first story is on GPU and ISP, the second story will cover CPU and NPU and the third story will cover modems, connectivity, and iSE.
Equipped With a Brain That Surpasses Computers: Strengthening Partner Collaboration
In this second series installment, Samsung Newsroom sat down with two project leaders at Samsung Electronics to better understand the role of CPU and NPU in mobile devices. A computer’s central processing unit (CPU) is often compared to the human cerebrum, the largest part of your brain that handles many responsibilities. Similarly, the CPU is the most important unit that deals with a computer’s four main functions, which are memory, decoding, operation, and control. CPU is the factor that determines the overall performance of a PC. Likewise, a mobile CPU runs all software on an operating system (OS) and controls other hardware peripherals, helping a smartphone perform at its optimal level. CPU performance is determined by a variety of factors, including the clock speed,1 IPC,2 and the number of cores.3 The phones of the past were powered by a single-core CPI with a simple pipeline structure. Consequently, there were limits in handling parallel processing, and the maximum frequency only amounted to a few hundred MHz. However, the CPU in smartphones today has a superscalar4 structure, allowing it to execute parallel processing for various commands or instructions. Additionally, it can run at 3 GHz speed, or 3 billion cycles per second, and have eight or more multi-core structures. Mobile CPUs now have a microarchitecture that pushes the performance beyond desktop CPUs. Exynos’ CPU has evolved from a big core to a big-little and then a big-mid-little structure to keep its size small and power consumption low. The big-little structure is a processing architecture concept that dynamically switches between two types of cores — a big and a little — to maximize performance or maximize power efficiency, depending on the task. For example, the CPU performance needed for texting versus playing a 3D game is different. Therefore, when sending a text, the process uses a smaller, power-efficient core instead of a high-performing core.
Project Leader Wookyeong Jeong has worked in the CPU field for more than 20 years since joining Samsung.
Project Leader Wookyeong Jeong has worked in the CPU field for more than 20 years since joining Samsung.
▲ Project Leader Wookyeong Jeong has worked in the CPU field for more than 20 years since joining Samsung.
“CPU determines the competitiveness of all systems, including the SoC. It’s an influential area and the top priority when it comes to developing advanced semiconductor technology,” said Wookyeong Jeong, the SoC Design Team 2’s project leader who is in charge of all tasks related to the Exynos’ CPU. Jeong has worked in the CPU field for more than 20 years since joining Samsung. “Achieving a high performance with a limited power budget is key,” said Jeong. “It is important to operate different types of CPU cores, including big, mid, and little in appropriate combinations to achieve maximum efficiency in various situations.” Exynos’ CPU optimizes a combination of activated cores to deliver users the best experience in situations requiring high performance, such as playing a game or using a camera on mobile devices.
CPU Core Structure of Exynos 2200
CPU Core Structure of Exynos 2200
▲ CPU Core Structure of Exynos 2200
Based on the IP of semiconductor design company Arm, Samsung Electronics is taking the performance of CPUs up a notch. When Jeong was asked about the specific tasks of the team’s developers, he explained the team’s role and responsibilities. “We decide the performance goal for the CPU of a product, acquire the CPU IP, predict and review the performance, validate and conduct debugging5 before mass production and further steps. We take care of the overall development work to enhance CPU performance,” Jeong explained. “The System LSI Business is responsible for taking the RTL CPU design from Arm to create an optimal semiconductor chip,” Jeong said. “The team is also responsible for designing and creating the CPU peripheral circuit, such as an appropriate memory subsystem, for maximizing CPU performance.” “With the adoption of Arm CPU, we have a vision of becoming the mobile industry’s best CPU manufacturer by optimizing software not only on a chip level but also on a device level. We aim to become an E2E6 total solution provider,” said Jeong when asked about the future development direction of the company. “To achieve this goal, the CPU developers have been working very closely with Arm, device manufacturers, Samsung Foundry, and others as one team since the early development stages. In addition, they’re seeking various ways to enhance performance, such as advanced packaging technology that enhances performance further,” Jeong explained.
“With the emergence of AR and the metaverse, appropriately utilizing all processors, such as CPU, GPU, and NPU for comprehensive machine learning processing on an SoC level would give us an important, competitive edge. We’re going to focus on increasing our competitiveness by strengthening the CPU’s performance in machine learning processing as well,” Jeong added. Real, Imaginative Technology: The Advancement of NPU Based on Proprietary Technology Throughout Six Generations
An NPU is a processor optimized for deep learning7 algorithm arithmetic. It can process a large amount of data as fast and efficiently as the human neural network. For such reason, it is mainly used for AI arithmetic and computation. While it may seem complicated, it is already commonly used in devices. For example, thanks to NPU, a smartphone’s camera can recognize and focus based on the objects, environment, and people in the frame. It can automatically switch on the food filter mode for food photography or even remove unwanted subjects in the picture.
AI Remover function within new smartphones improved as NPU developed.
AI Remover function within new smartphones improved as NPU developed.
▲ AI Remover function within new smartphones improved as NPU developed.
In the past, when NPU did not exist, GPU mainly performed AI computation. However, the computation efficiency8 was low due to the hardware’s structural differences. These days, the NPU is mainly in charge of AI computation, and it can process data more efficiently in mobile devices as well. It’s optimized for parallel data computing so that AI-based applications can run faster on low power.
Project Leader Suknam Kwon, who has been working on the NPU since its second generation, now leads the NPU developers.
Project Leader Suknam Kwon, who has been working on the NPU since its second generation, now leads the NPU developers.
▲ Project Leader Suknam Kwon, who has been working on the NPU since its second generation, now leads the NPU developers.
Exynos’ NPU development began in 2016. The first SoC equipped with the NPU was Exynos 9820, which was embedded in the Galaxy S10 that was released in 2019. “When the first task force was formed six years ago, we had only about 20 people, but now our team has grown tenfold if we include the members from our overseas research institutes,” said project leader Suknam Kwon. Kwon used to design the hardware of the SoC and has been working on the NPU since its second generation. “The NPU is an area of high interest these days, but back then, it was so unfamiliar and new that we had to learn from videos and university lectures overseas.” In the past, there were few applications for the NPU, including detecting objects based on images. However, in the era of AI, market demand for high-performing IP requiring a large amount of computation is increasing. This can be used to perform tasks such as improving camera picture quality, voice services, and more. In addition, since size and power consumption increase as IP performance is enhanced, determining the most efficient architecture is key.
AI using cloud servers compared to On-device AI
AI using cloud servers compared to On-device AI
▲ AI using cloud servers compared to On-device AI
As NPU gets more powerful, it offers improvements in object recognition speed or photo enhancement. The performance of the NPU equipped in the latest Exynos is two times more enhanced compared to the previous generation. By independently developing the NPU for six product generations, the SoC Design team’s expertise and know-how in NPU technology are second to none. “With advantages in benchmark such as the MLPerf, power efficiency, size, etc., Exynos’ NPU is a highly competitive IP solution,” Kwon said. “Through optimization of architecture for performance and improvements in power efficiency, the NPU adds competitive value for Exynos processor,” he said.
Going forward, the technologies that utilize NPU will continue to evolve. “I think the on-device AI, which performs AI computation in one’s smartphone rather than going through a server, will become more widely used because there is less risk of having sensitive personal information leaked,” Kwon said. “Because of this, mobile NPU performance needs to be even more enhanced. These days, one NPU is used for many computations, but I predict that there will be more demands for operating specialized AI algorithms for each application program. So, developing an NPU that is specialized for each domain will be important as well,” he added. When asked about autonomous driving, Kwon discussed the role that NPU will play in the industry. “In the near future, the advanced driver-assistance system (ADAS) will become a reality,” Kwon said. “It requires hardware that can perform autonomous driving algorithms using a massive amount of data in real-time. To accomplish this, a higher-performing NPU is needed, and Samsung is preparing an NPU with powerful capabilities for autonomous driving devices that meet the market’s demands.” At the end of the interview, Kwon explained the most meaningful moment that occurred during development. “Each year, Exynos comes with a higher-performing NPU that is increasingly enhanced, which is very meaningful,” he said. “It will continue to become a key IP for future markets. I take a lot of pride in the fact that developing NPU has led to the growth of both myself and the company — and even contributes to the country’s overall competitiveness,” he said. “It’s the best field where it makes the things in one’s imagination come true.” * All images shown are provided for illustrative purposes only and may not be an exact representation of the product or images captured with the product. All images are digitally edited, modified or enhanced.
1 Clock: Continuously generates electric oscillation of 0 or 1 for computation. It’s expressed in Hz, and a higher clock figure means a faster processing speed. 2 IPC (Instructions per Cycle): Instructions processed per clock. It measures the clock needed to process one command or instruction. IPC is the unit that assesses how efficiently a CPU is operating. 3 Core: The key part of the physical processing circuit within the CPU. The more cores there are, the easier it is to perform multiple actions at the same time. Single-core means there’s one core, dual-core means there are two, quad-core means there are four, and so on. 4 Superscalar: An architecture that combines the advantages of pipeline and parallels processing and enables instructions from multiple pipelines to be processed in parallel. The processing speed is fast because multiple instructions can be executed at the same time without having to go through waiting status first. 5 Debugging: A process of checking whether the designed program is accurate, identifying program errors and fixing them. 6 End to End 7 Deep Learning: Technology that enables a machine to learn, infer, and reason like human beings using data. 8 In mobile SoC, efficiency means it uses less power or has faster speeds.

Would you like to
leave this page?
If you leave this page, the content you are creating
will not be saved.

Registration Are you sure you want to submit this?

Thank you! Please confirm your registration

Your subscription is not active yet!
An email with an activation link
has just been sent to your email address.
Please activate your subscription by clicking on
the activation link inside the email.

Confirm
Thank you! Please confirm

your existing registration

You have already registered, but before we can send you the
information about upcoming events, we need your confirmation.

If you missed our previous email, please use the button below to resend it.
To activate your subscription, please click on the link included in the email.

Resend
Alert

To proceed, please click on the "check" button located in the email section.

Confirm