Machine Learning Developer Guide


Overview

Due to the different hardware on each platform, the AIoT Yocto provides different machine learning software stacks for the developer. Table 1 shows the hardware difference on different boards. Table 2 shows the difference in machine learning software stacks on different boards.

Table 1. Hardware Devices on Board

Genio 350-EVK

Genio 1200-DEMO

GPU

V

V

VPU

V

V

MDLA

X

V

Note

For the introduction of hardware devices, please refer to Hardware Devices

Table 2. Software Stack on Board

Software Stack

Backend

Genio 350-EVK

Genio 1200-DEMO

Tensorflow-Lite

CPU

V

V

Tensorflow-Lite + GPU delegate

GPU

V

V

Tensorflow-Lite + ARMNN Delegate

GPU, CPU

V

V

Tensorflow-Lite + NNAPI Delegate

VPU

V

X

Neuron SDK

MDLA, VPU

X

V


Reference Boards

AIoT Yocto provides different machine learning software stacks on different SoC platforms. Please find more details about machine learning on each reference board:

Appendix

Hardware Devices

GPU

The GPU provides neural network acceleration for floating point models.

  • ARM-based platforms could support GPU neural network acceleration via Arm NN and the Arm Compute Library.

  • Non-ARM platforms could support GPU neural network acceleration via Google’s TensorFlow Lite GPU delegate. This GPU delegate can accelerate a wide selection of TFlite operations.

Note

On AIoT Yocto, we support both of the above GPU neural network accelerations.

VPU

The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.

Note

  • The first version of the VPU is known as the Cadence VP6.

  • The second version of the VPU is known as the MediaTek Vision Processing Unit 2.0 (MVPU 2.0).

MDLA

The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.