Neuron SDK
Introduction
Hardware Support
Neuron SDK can use the following target compute devices to run neural network models.
CPU
VPU (Vision Processing Unit)
MDLA (MediaTek Deep Learning Accelerator)
Successful use of these cores depends on the following factors, which interact with a user’s model.
Neural network framework format of the trained model.
Hardware platform (e.g. part number and device capability).
Required model accuracy. Models with high accuracy requirements might limit the type and significance of the optimizations that can be applied to the model. This might also limit the target devices that can run the model with the required performance and accuracy.
Neural network model structure. Certain operation (OP) types are not supported on certain targets device. For details, refer to the Supported Operations section.
Note
Some platforms do not have a VPU or MDLA.
Device Parametric Table
Device |
Operator Flexibility |
Performance |
Power Consumption |
Data Types |
---|---|---|---|---|
CPU |
Very High |
Low |
High |
FP32, FP16, INT16, INT8 |
VPU |
Medium |
High |
Low |
FP32, FP16, INT16, INT8 |
MDLA |
Low |
Very High |
Low |
FP16, INT16, INT8 |
As a general rule, you should target the most power-efficient device that your neural network or developer constraints can support. The lowest-power devices are also the highest performing.
Devices
CPU
The CPU is capable of running any neural network and is guaranteed to support all existing and future NN operations. Support is provided in the TFlite. The CPU is the most flexible target device, but it is also the least optimized for power and performance.
VPU
The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.
Note
The first version of the VPU is known as the Cadence VP6.
The second version of the VPU is known as the MediaTek Vision Processing Unit 2.0 (MVPU 2.0).
MDLA
The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.
The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip.
Board/Soc Platform Support
Currently, Neuron SDK is only available on the following board/Soc Platform. Please make sure your board has Neuron SDK support:
Board |
SoC Platform |
Neuron SDK Support |
Neuron Software Version |
Genio 350-EVK |
MT8365 |
X |
X |
Genio 1200-EVK |
MT8395 |
V |
6 |
Genio 700-EVK |
MT8390 |
V |
6 |
Board |
SoC Platform |
APU Version |
VPU Version |
MDLA Version |
Genio 350-EVK |
MT8365 |
1 |
1 |
X |
Genio 1200-EVK |
MT8395 |
3 |
1 |
2 |
Genio 700-EVK |
MT8390 |
5 |
1 |
3 |
Overview
Neuron SDK allows users to efficiently compile a custom Neural Network model and then execute the model on MediaTek platforms while utilizing MediaTek’s AI Processing Unit (APU).
Neuron compiler (ncc-tflite
) transforms a TFLite model file into a DLA (Deep Learning Archive) file. A DLA file is a low-level binary for MDLA and VPU compute devices.
Neuron Runtime (neuronrt
) provides APIs to load a DLA file and performs on-device inference.
The figure below provides an overview of the user flow for Neuron SDK.
The Neuron SDK consists of the following components:
Neuron Compiler: An offline neural network model compiler(
ncc-tflite
) that produces statically compiled deep learning archive (DLA) files.Neuron Runtime: A command line tool(
neuronrt
) that executes a specified DLA file and reports the results.Neuron Runtime API: A user-invoked API that supports loading and running compiled DLA files within a user’s C++ application
Neuron Profiler: A built-in performance profiler tool in Neuron Runtime.
Supported Operations
This section describes all the neural network operations supported by Neuron SDK, and any restrictions placed on their use.
Note
Different compute devices may have restrictions on supported operations. These restrictions are a function of:
Op Type
Op parameters (e.g. kernel dimensions and modifiers, such as stride)
Tensor dimensions (both input and output)
Soc Platform
Numeric format, both data type, and quantization method
Each device will have its guidelines and restrictions.
Find all the neural network operations supported by Neuron SDK, and any restrictions placed on their use according to the reference board:
MT8395 P1V6 demo board (deprecated in v23.1) Supported Operations
Genio 700-EVK Supported Operations