SHANGHAI, Sept. 14, 2018 /PRNewswire/ -- Kneron, the leading company in edge artificial intelligence (AI) solutions, attended the Arm AI Global Developers Conference in Shanghai, China today. In an address: "The Application of Reconfigurable Computing in AI Chips," Kneron announced its new generation of AI processors, the NPU IP -- KDP series, which are designed for edge devices. These 2nd Gen NPU IP chips comprise three product lines: the ultra-low power KDP 320, the standard KDP 520, and a high-performance version, the KDP 720. With power consumption of under 0.5W, the new series has architecture designed to give more computing flexibility, which also raises overall performance by up to 3 times that of the last generation, to hit 5.8 TOPS(*1).
Albert Liu, the founder and CEO of Kneron, said, "Thanks to its ultra-low power consumption, the Kneron NPU IP has attracted much market attention. The new one, based on the advantages of the 1st Gen NPU IP, increases data movement efficiency, computation, memory utilization, and optimizes performance to support different neural networks. The new NPU IP can be widely applied in diverse applications and satisfies complicated computing requirements."
The Kneron NPU IP allows neural networks to run on edge devices, including smartphones, home appliances, surveillance equipment, and a full range of IoT devices. The new interleaving computation architecture of the Kneron NPU IP simplifies computing flow and improves efficiency. Its deep compression technology raises the compression ratio by applying the technology from the model layer down to the data and coefficient layers, and the dynamic memory allocation enhances memory utilization without hampering computing performance. This is in addition to the more comprehensive and optimized Convolutional Neural Networks (CNN) model support that ups performance by 150-300%.
Technology highlights of 2nd Gen NPU IP-KDP Series:
Interleaving computation architecture: the interleaving architecture enables parallel convolution computing and pooling to improve overall performance. The new convolution layer can support both 8 and 16 bits fixed points concurrently to increase computing flexibility.
Deep compression technology: this technology compresses not only models but also data and coefficients to reduce memory use during computing. The model compression ratio allows up to 50 fold compaction with less than a 1% impact on accuracy.
Dynamic memory allocation: this enables more efficient resource allocation between shared memory and operating memory to increase the overall memory utilization without hampering computing performance.
CNN model support optimization: This supports even more CNN models, including Vgg16, Resnet, GoogleNet, YOLO, Tiny YOLO, Lenet, MobileNet, and Densenet with model specific performance optimization of up to 150-300% better than its predecessors.
*1: Performance may differ with the process. The 5.8 TOPS is performance figure of KDP720 under 28nm process, 600MHz frequency, 8bit fixed point. The estimated power consumption is 300-500mW (estimated energy efficiency is 13.17 TOPS/W).
Last updated: Sep 25, 2019 at 05:39 pm CDT