Neuron SDK

Introduction 

Hardware Support

Neuron SDK can use the following target compute devices to run neural network models.

CPU
VPU (Vision Processing Unit)
MDLA (MediaTek Deep Learning Accelerator)

Successful use of these cores depends on the following factors, which interact with a user’s model.

Neural network framework format of the trained model.
Hardware platform (e.g. part number and device capability).
Required model accuracy. Models with high accuracy requirements might limit the type and significance of the optimizations that can be applied to the model. This might also limit the target devices that can run the model with the required performance and accuracy.
Neural network model structure. Certain operation (OP) types are not supported on certain targets device. For details, refer to the Supported Operations section.

Note

Some platforms do not have a VPU or MDLA.

Device Parametric Table

Device	Operator Flexibility	Performance	Power Consumption	Data Types
CPU	Very High	Low	High	FP32, FP16, INT16, INT8
VPU	Medium	High	Low	FP32, FP16, INT16, INT8
MDLA	Low	Very High	Low	FP16, INT16, INT8

As a general rule, you should target the most power-efficient device that your neural network or developer constraints can support. The lowest-power devices are also the highest performing.

Devices

The CPU is capable of running any neural network and is guaranteed to support all existing and future NN operations. Support is provided in the TFlite. The CPU is the most flexible target device, but it is also the least optimized for power and performance.

VPU

The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.

Note

The first version of the VPU is known as the Cadence VP6.
The second version of the VPU is known as the MediaTek Vision Processing Unit 2.0 (MVPU 2.0).

MDLA

The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.

The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip.

Board/Soc Platform Support

Currently, Neuron SDK is only available on the following board/Soc Platform. Please make sure your board has Neuron SDK support:

Neuron SDK Support on Board/Soc Platform
Board	SoC Platform	Neuron SDK Support	Neuron Software Version
Genio 350-EVK	MT8365	X	X
Genio 1200-EVK	MT8395	V	6
Genio 700-EVK	MT8390	V	6

Hardware Version on Board/Soc Platform
Board	SoC Platform	APU Version	VPU Version	MDLA Version
Genio 350-EVK	MT8365	1	1	X
Genio 1200-EVK	MT8395	3	1	2
Genio 700-EVK	MT8390	5	1	3

Overview 

Neuron SDK allows users to efficiently compile a custom Neural Network model and then execute the model on MediaTek platforms while utilizing MediaTek’s AI Processing Unit (APU).

Neuron compiler (ncc-tflite) transforms a TFLite model file into a DLA (Deep Learning Archive) file. A DLA file is a low-level binary for MDLA and VPU compute devices.

Neuron Runtime (neuronrt) provides APIs to load a DLA file and performs on-device inference.

The figure below provides an overview of the user flow for Neuron SDK.

../../../_images/sw_rity_ml-guide_neuron_sdk_flow.svg

The Neuron SDK consists of the following components:

Neuron Compiler: An offline neural network model compiler(ncc-tflite) that produces statically compiled deep learning archive (DLA) files.
Neuron Runtime: A command line tool(neuronrt) that executes a specified DLA file and reports the results.
Neuron Runtime API: A user-invoked API that supports loading and running compiled DLA files within a user’s C++ application
Neuron Profiler: A built-in performance profiler tool in Neuron Runtime.

Supported Operations 

This section describes all the neural network operations supported by Neuron SDK, and any restrictions placed on their use.

Note

Different compute devices may have restrictions on supported operations. These restrictions are a function of:

Op Type

Op parameters (e.g. kernel dimensions and modifiers, such as stride)

Tensor dimensions (both input and output)

Soc Platform

Numeric format, both data type, and quantization method

Each device will have its guidelines and restrictions.

Find all the neural network operations supported by Neuron SDK, and any restrictions placed on their use according to the reference board:

MT8395 P1V6 demo board (deprecated in v23.1) Supported Operations
Genio 700-EVK Supported Operations

Neuron SDK

Introduction

Hardware Support

Device Parametric Table

Devices

CPU

VPU

MDLA

Board/Soc Platform Support

Overview

Supported Operations

Introduction 

Overview 

Supported Operations 