Machine Learning Developer Guide
Overview
Due to the different hardware on each platform, the IoT Yocto provides different machine learning software stacks for the developer. Table 1 shows the hardware difference on different boards. Table 2 shows the difference in machine learning software stacks on different boards.
Genio 350-EVK |
Genio 1200-EVK |
Genio 700-EVK |
|
GPU |
V |
V |
V |
VPU |
V |
V |
V |
MDLA |
X |
V |
V |
Note
For the introduction of hardware devices, please refer to Hardware Devices
Software Stack |
Backend |
Genio 350-EVK |
Genio 1200-EVK |
Genio 700-EVK |
Tensorflow-Lite |
CPU |
V |
V |
V |
Tensorflow-Lite + GPU delegate |
GPU |
V |
V |
V |
Tensorflow-Lite + ARMNN Delegate |
GPU, CPU |
V |
V |
V |
Tensorflow-Lite + NNAPI Delegate |
VPU |
V |
X |
X |
Neuron SDK |
MDLA, VPU |
X |
V |
V |
Reference Boards
IoT Yocto provides different machine learning software stacks on different SoC platforms. Please find more details about machine learning on each reference board:
Appendix
Hardware Devices
GPU
The GPU provides neural network acceleration for floating point models.
ARM-based platforms could support GPU neural network acceleration via Arm NN and the Arm Compute Library.
Non-ARM platforms could support GPU neural network acceleration via Google’s TensorFlow Lite GPU delegate. This GPU delegate can accelerate a wide selection of TFlite operations.
Note
On IoT Yocto, we support both of the above GPU neural network accelerations.
VPU
The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.
Note
The first version of the VPU is known as the Cadence VP6.
The second version of the VPU is known as the MediaTek Vision Processing Unit 2.0 (MVPU 2.0).
MDLA
The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.