.. include:: /keyword.rst .. _Neuron SDK: ========== Neuron SDK ========== .. contents:: Sections :local: :depth: 1 .. toctree:: :hidden: Neuron Tools Neuron RunTime API Neuron RunTime API V1 Neuron RunTime API V2 Neuron API Reference Neuron Profiler Genio 510/700 Supported Operations Genio 1200 Supported Operations .. DISABLE DLA-MUXER PART: Neuron DLA Muxer Introduction ============ Hardware Support **************** Neuron SDK can use the following target compute devices to run neural network models. * CPU * VPU (Vision Processing Unit) * MDLA (MediaTek Deep Learning Accelerator) Successful use of these cores depends on the following factors, which interact with a user's model. * Neural network framework format of the trained model. * Hardware platform (e.g. part number and device capability). * Required model accuracy. Models with high accuracy requirements might limit the type and significance of the optimizations that can be applied to the model. This might also limit the target devices that can run the model with the required performance and accuracy. * Neural network model structure. Certain operation (OP) types are not supported on certain targets device. For details, refer to the :ref:`Supported Operations section `. .. note:: * Some platforms do not have a VPU or MDLA. Device Parametric Table *********************** .. list-table:: :header-rows: 1 * - Device - Operator Flexibility - Performance - Power Consumption - Data Types * - CPU - Very High - Low - High - FP32, FP16, INT16, INT8 * - VPU - Medium - High - Low - FP32, FP16, INT16, INT8 * - MDLA - Low - Very High - Low - FP16, INT16, INT8 As a general rule, you should target the most power-efficient device that your neural network or developer constraints can support. The lowest-power devices are also the highest performing. Devices ******* CPU ^^^ The CPU is capable of running any neural network and is guaranteed to support all existing and future NN operations. Support is provided in the TFLite. The CPU is the most flexible target device, but it is also the least optimized for power and performance. VPU ^^^ The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models. MDLA ^^^^ The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers. The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip. Board/Soc Platform Support ************************** Currently, Neuron SDK is only available on the following board/Soc Platform. Please make sure your board has Neuron SDK support: .. csv-table:: Neuron SDK Support on Board/Soc Platform :class: longtable :file: /_asset/tables/ml-platform-neuron-sdk-support.csv :width: 65% :widths: 25 25 25 25 .. csv-table:: Hardware Version on Board/Soc Platform :class: longtable :file: /_asset/tables/ml-platform-neuron-hw-version.csv :width: 65% :widths: 20 20 20 20 20 Overview ======== Neuron SDK allows users to efficiently compile a custom Neural Network model and then execute the model on MediaTek platforms while utilizing MediaTek’s AI Processing Unit (APU). Neuron compiler (``ncc-tflite``) transforms a TFLite model file into a DLA (Deep Learning Archive) file. A DLA file is a low-level binary for MDLA and VPU compute devices. Neuron Runtime (``neuronrt``) provides APIs to load a DLA file and performs on-device inference. The figure below provides an overview of the user flow for Neuron SDK. .. figure:: /_asset/sw_rity_ml-guide_neuron_sdk_flow.svg :align: center The Neuron SDK consists of the following components: - :ref:`Neuron Compiler `: An offline neural network model compiler(``ncc-tflite``) that produces statically compiled deep learning archive (DLA) files. - :ref:`Neuron Runtime `: A command line tool(``neuronrt``) that executes a specified DLA file and reports the results. - :doc:`Neuron Runtime API `: A user-invoked API that supports loading and running compiled DLA files within a user’s C++ application - :doc:`Neuron Profiler `: A built-in performance profiler tool in Neuron Runtime. .. DISABLE DLA-MUXER PART: - :ref:`DLA Packer `: A tool for packing multiple DLA files into a single deep learning bundle (DLB) file, with support for cross-DLA cooperation. - :doc:`Neuron DLA Muxer API `: An API for interacting with deep learning bundle (DLB) files. .. _ml_neuron-supported-operations: Supported Operations ====================== This section describes all the neural network operations supported by Neuron SDK, and any restrictions placed on their use. .. note:: Different compute devices may have restrictions on supported operations. These restrictions are a function of: #. Op Type #. Op parameters (e.g. kernel dimensions and modifiers, such as stride) #. Tensor dimensions (both input and output) #. Soc Platform #. Numeric format, both data type, and quantization method Each device will have its guidelines and restrictions. Find all the neural network operations supported by Neuron SDK, and any restrictions placed on their use according to the reference board: - |G510-G700-EVK-REF-BOARD| :doc:`Supported Operations ` - |G1200-EVK-REF-BOARD| :doc:`Supported Operations `