The newest generation, which was launched early in 2022, provides three new major features:
Advancing NPU Technology
- Scatter-gather helps prevent memory bottlenecks and provides effective data feeding to the ALUs (Arithmetic Logical Units).
- Extreme low power mode allows the NPU to operate without DRAM to support always-on scenarios.
- Multi-precision ALU provides FP16 (Half-precision floating-point format) support in addition to existing INT8 and INT4 support using a single ALU to achieve greater efficiency and flexibility.
“We believe that advancements in NPU technology will continue at three different level. The first level involves enhancing efficiency at an IP level,” explained Park. “One approach is ensuring the massive number of ALUs are well fed with data and well utilized. Another method is making more ALUs available through an increased compute density. Work reduction through bit-precision optimization and better gating of unused blocks can also help.”
The power limit of today’s mobile devices will soon support IPs running at the lowest possible voltage supported by processer technology. This will make it difficult to translate performance improvement to efficiency improvement which is why other methods are needed to improve NPU technology.
“The second level involves improving system-wide power efficiency. Improvements in NPU efficiency see other components in the system starting to use a significant amount of power. Other components that have been ignored are CPU, ISP (Image Signal Processor), NOC (Network-on-Chip), DRAM and Power Management IC,” said Park. “Power consumed by other IPs is generally proportional to the inference rate and if those IPs don’t improve at the same rate as the NPU the budget for the NPU will reduce over time. Minimizing cross-IP data movement while increasing CPU task offloading to more efficient cores are some of the ways to drive improvement at this level.”