Neuron SDK

Introduction

Hardware Support

Neuron SDK can use the following target compute devices to run neural network models.

  • CPU

  • VPU (Vision Processing Unit)

  • MDLA (MediaTek Deep Learning Accelerator)

Successful use of these cores depends on the following factors, which interact with a user’s model.

  • Neural network framework format of the trained model.

  • Hardware platform (e.g. part number and device capability).

  • Required model accuracy. Models with high accuracy requirements might limit the type and significance of the optimizations that can be applied to the model. This might also limit the target devices that can run the model with the required performance and accuracy.

  • Neural network model structure. Certain operation (OP) types are not supported on certain targets device. For details, refer to the Supported Operations section.

Note

  • Some platforms do not have a VPU or MDLA.

Device Parametric Table

Device

Operator Flexibility

Performance

Power Consumption

Data Types

CPU

Very High

Low

High

FP32, FP16, INT16, INT8

VPU

Medium

High

Low

FP32, FP16, INT16, INT8

MDLA

Low

Very High

Low

FP16, INT16, INT8

As a general rule, you should target the most power-efficient device that your neural network or developer constraints can support. The lowest-power devices are also the highest performing.

Devices

CPU

The CPU is capable of running any neural network and is guaranteed to support all existing and future NN operations. Support is provided in the TFlite. The CPU is the most flexible target device, but it is also the least optimized for power and performance.

VPU

The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.

Note

  • The first version of the VPU is known as the Cadence VP6.

  • The second version of the VPU is known as the MediaTek Vision Processing Unit 2.0 (MVPU 2.0).

MDLA

The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.

The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip.

Board/Soc Platform Support

Currently, Neuron SDK is only available on the following board/Soc Platform. Please make sure your board has Neuron SDK support:

Neuron SDK Support on Board/Soc Platform

Board

SoC Platform

Neuron SDK Support

Neuron Software Version

Genio 350-EVK

MT8365

X

X

Genio 1200-EVK

MT8395

V

6

Genio 700-EVK

MT8390

V

6

Hardware Version on Board/Soc Platform

Board

SoC Platform

APU Version

VPU Version

MDLA Version

Genio 350-EVK

MT8365

1

1

X

Genio 1200-EVK

MT8395

3

1

2

Genio 700-EVK

MT8390

5

1

3

Overview

Neuron SDK allows users to efficiently compile a custom Neural Network model and then execute the model on MediaTek platforms while utilizing MediaTek’s AI Processing Unit (APU).

Neuron compiler (ncc-tflite) transforms a TFLite model file into a DLA (Deep Learning Archive) file. A DLA file is a low-level binary for MDLA and VPU compute devices.

Neuron Runtime (neuronrt) provides APIs to load a DLA file and performs on-device inference.

The figure below provides an overview of the user flow for Neuron SDK.

../../../_images/sw_rity_ml-guide_neuron_sdk_flow.svg

The Neuron SDK consists of the following components:

  • Neuron Compiler: An offline neural network model compiler(ncc-tflite) that produces statically compiled deep learning archive (DLA) files.

  • Neuron Runtime: A command line tool(neuronrt) that executes a specified DLA file and reports the results.

  • Neuron Runtime API: A user-invoked API that supports loading and running compiled DLA files within a user’s C++ application

  • Neuron Profiler: A built-in performance profiler tool in Neuron Runtime.

Supported Operations

This section describes all the neural network operations supported by Neuron SDK, and any restrictions placed on their use.

Note

Different compute devices may have restrictions on supported operations. These restrictions are a function of:

  1. Op Type

  2. Op parameters (e.g. kernel dimensions and modifiers, such as stride)

  3. Tensor dimensions (both input and output)

  4. Soc Platform

  5. Numeric format, both data type, and quantization method

Each device will have its guidelines and restrictions.

Find all the neural network operations supported by Neuron SDK, and any restrictions placed on their use according to the reference board: