Machine Learning Developer Guide

Overview 

Due to the different hardware on each platform, the IoT Yocto provides different machine learning software stacks for the developer. Table 1 shows the hardware difference on different boards. Table 2 shows the difference in machine learning software stacks on different boards.

Table 1. Hardware Devices on Board
	Genio 350-EVK	Genio 510-EVK	Genio 700-EVK	Genio 1200-EVK
GPU	V	V	V	V
VPU	V	V	V	V
MDLA	X	V	V	V

Note

For the introduction of hardware devices, please refer to Hardware Devices

Table 2. Software Stack on Board
Software Stack	Backend	Genio 350-EVK	Genio 510-EVK	Genio 700-EVK	Genio 1200-EVK
Tensorflow-Lite	CPU	V	V	V	V
Tensorflow-Lite + GPU delegate	GPU	V	V	V	V
Tensorflow-Lite + ARMNN Delegate	GPU, CPU	V	V	V	V
Tensorflow-Lite + NNAPI Delegate	VPU	V	X	X	X
Tensorflow-Lite + Neuron Stable Delegate	MDLA, VPU	X	V	V	V
Neuron SDK	MDLA, VPU	X	V	V	V

Reference Boards 

IoT Yocto provides different machine learning software stacks on different SoC platforms. Please find more details about machine learning on each reference board:

Model Hub 

IoT Yocto provides a list of models that are usable on G510, G700 and G1200. Please go to Model Hub section to download them.

Common Q&A 

Frequently asked questions for machine learning are collected and record in:

Common Q&A.

Appendix 

Hardware Devices

GPU

The GPU provides neural network acceleration for floating point models via Google’s TensorFlow Lite GPU delegate. This GPU delegate can accelerate a wide selection of TFLite operations.

Note

On IoT Yocto, we support both of the above GPU neural network accelerations.

VPU

The Neuron Processing Unit (NPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The NPU also offers outstanding performance while running AI models.

MDLA

The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.

Machine Learning Developer Guide

Overview

Reference Boards

Model Hub

Common Q&A

Appendix

Hardware Devices

GPU

VPU

MDLA

Overview 

Reference Boards 

Model Hub 

Common Q&A 

Appendix 