.. include:: /keyword.rst .. _Neuron SDK: ======================================= Neuron Compiler and Runtime (NeuronSDK) ======================================= This section introduces the tools inside **Neuron SDK** used to compile and verify AI models for Genio platforms. After converting a model to TFLite format (as described in :doc:`Model Converter `), the developer must compile it into a hardware-specific binary for optimized execution. To start the compilation and deployment experiment immediately, see :ref:`Compile TFLite Models to DLA `. Workflow Overview ================= The Neuron SDK provides a specialized toolchain for transforming and executing neural network models on the MediaTek Deep Learning Accelerator (MDLA) and Video Processing Unit (VPU). .. figure:: /_asset/ai-workflow-overview-step3.png :align: center :width: 80% .. raw:: html
The SDK includes two primary command-line tools for the deployment workflow: * **Neuron Compiler** (``ncc-tflite``): An offline tool that runs on a host PC to convert TFLite models into **Deep Learning Archive (DLA)** files. * **Neuron Runtime** (``neuronrt``): An on-device utility that loads DLA files and executes inference to verify model compatibility with the hardware's offline inference path. .. note:: The Neuron SDK workflow and capabilities on |IOT-YOCTO| align with the official NeuroPilot documentation. The instructions provided here apply specifically to Genio IoT platforms. Neuron SDK Components ===================== Neuron Compiler (ncc-tflite) ---------------------------- The Neuron Compiler serves as the bridge between generic framework models and MediaTek hardware. It performs the following tasks: * Validates TFLite model structures against hardware constraints. * Optimizes network graphs for specific MDLA and VPU architectures. * Generates a statically compiled ``.dla`` binary. For a comprehensive list of optimization flags (such as ``--opt-bw`` or ``--relax-fp32``), refer to the `Neuron Compiler Section `_. Neuron Runtime (neuronrt) ------------------------- The Neuron Runtime is a lightweight execution engine designed for rapid validation. It enables the developer to: * Load compiled ``.dla`` files directly on the Genio device. * Confirm that the model executes successfully on the NPU backend. * Measure basic inference latency and resource utilization. .. important:: The ``neuronrt`` tool is intended for **verification and benchmarking** of the offline inference path. For production applications, MediaTek recommends using the **Neuron Runtime API** to integrate model execution directly into C/C++ or Android applications. Neuron Runtime API ------------------ A C/C++ API that: * Allows applications to load DLA files directly. * Provides control over input/output tensors, inference scheduling, and metadata (for example, via ``NeuronRuntime_getMetadataInfo`` and ``NeuronRuntime_getMetadata``). * Enables integration of Neuron‑accelerated inference into existing applications without invoking the ``neuronrt`` binary. .. note:: For host-side model development and conversion, it is recommended to **NeuronSDK** on a PC. On-device compilation with ``ncc-tflite`` is supported only for specific use cases and may fail for large or complex models. .. _compile-tflite-to-dla: Experiment: Compile TFLite Models to DLA ======================================== This section demonstrates how to use the Neuron Compiler (``ncc-tflite``) to transform the YOLOv5s model into a deployable hardware binary. Environment Setup ----------------- 1. **Download the NeuroPilot SDK:** Obtain the **SDK All-In-One Bundle (Version 6)** from the `Official Portal `_. 2. **Extract the toolchain:** .. code-block:: bash tar zxvf neuropilot-sdk-basic-.tar.gz 3. **Configure host libraries:** The compiler requires specific shared libraries located within the bundle. .. code-block:: bash export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-/neuron_sdk/host/lib Model Compilation Examples -------------------------- The developer performs compilation on the **host PC** using the converted YOLOv5s models. * **For INT8 Quantized Models:** .. code-block:: bash ./ncc-tflite --arch=mdla3.0 yolov5s_int8_mtk.tflite -o yolov5s_int8.dla * **For FP32 Models:** The ``--relax-fp32`` flag allows the compiler to use FP16 precision for NPU acceleration. .. code-block:: bash ./ncc-tflite --arch=mdla3.0 --relax-fp32 yolov5s_mtk.tflite -o yolov5s_fp32.dla Verify Execution with Neuron Runtime ==================================== After generating the ``.dla`` file, the developer must verify its execution on the target Genio platform. This step ensures the model is compatible with the **offline inference path** and can run successfully on the NPU. Verify Model Compatibility -------------------------- 1. **Transfer the model to the device:** .. code-block:: bash adb push yolov5s_int8.dla /tmp/ 2. **Generate dummy input data:** Use the ``dd`` command to create a binary file matching the model's input size (e.g., 640x640x3 for YOLOv5s). .. code-block:: bash # Example for a specific input byte size adb shell "dd if=/dev/zero of=/tmp/input.bin bs=1 count=1228800" 3. **Execute inference:** Run the ``neuronrt`` command to validate the DLA file on the hardware (``-m hw``). Use the ``-c`` flag to repeat the inference for performance profiling. .. code-block:: bash adb shell "neuronrt -m hw -a /tmp/yolov5s_int8.dla -i /tmp/input.bin -c 10" **Example Output:** .. code-block:: text Inference repeats for 10 times. Total inference time = 52.648 ms (5.2648 ms/inf) Avg. FPS : 186.1 .. code-block:: bash adb shell "neuronrt -m hw -a /tmp/yolov5s_int8.dla -i /tmp/input.bin" Successful execution confirms that the model layers are correctly mapped to the NPU and the offline path is functional. Supported Operations ==================== This section summarizes how to determine which neural network operations and configurations are supported by Neuron SDK on a given platform. .. note:: Operation support is constrained by multiple factors, including: #. Operation type. #. Operation parameters (for example, kernel size, stride, padding configuration). #. Tensor dimensions for both inputs and outputs. #. SoC platform and accelerator generation (for example, MDLA version). #. Numeric format, including data type and quantization scheme. Each compute device (MDLA, VPU, or CPU fallback) has its own guidelines and restrictions. To check the supported operations and platform-specific constraints, please refer to the **Hardware Specifications** and **Supported OPs** documentation for the target reference board in the `NeuroPilot Online Documentation `_.