YOLOv5s Models
==============

Overview
--------

YOLOv5s is a variant of the YOLO (You Only Look Once) family of object detection models, designed to be a smaller and faster version suitable for real-time object detection tasks. YOLOv5 was developed by Ultralytics and offers improved speed and accuracy compared to previous YOLO versions.

Model Conversion Flow
---------------------

Precondition
^^^^^^^^^^^^

.. note::
   For better compatibility, it is recommended to use **Python 3.7** when working with these models, as it has higher compatibility with certain libraries and frameworks.

Before you begin, ensure that the :doc:`NeuroPilot Converter Tool </sw/yocto/ml-guide/neuron-dev-flow/model_converter/neuropilot_converter_tool>` is installed. If you haven't installed it yet, please follow the instructions in the "Install and Verify NeuroPilot Converter Tool" section of the same guide.

1. Clone the repository:

   .. code-block:: bash

       git clone http://github.com/ultralytics/yolov5
       cd yolov5
       git reset --hard 485da42

2. Install Python packages and dependencies:

   .. code-block:: bash

       pip3 install -r requirements.txt
       pip3 install torch==1.9.0 torchvision==0.10.0

   .. note::

       The `mtk_converter.PyTorchConverter` only supports PyTorch versions between 1.3.0 and 2.0.0. 
       The detected version v2.3.1+cu121 is not within this supported range, causing a runtime error.
       Therefore, it is necessary to install a compatible version of PyTorch and ``torchvision`` to ensure compatibility.

3. Apply Patch:

   .. code-block:: bash

       git apply Fix_yolov5_mtk_tflite_issue.patch

   .. note::

       The `Fix_yolov5_mtk_tflite_issue.patch <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/patches/Fix_yolov5_mtk_tflite_issue.patch>`_ adds support for MTK TensorFlow Lite (MTK TFLite) in the YOLOv5 model export script. It includes:
          
       - Adding `mtk_tflite` as a supported export format.
       - Modifying the `Detect` module's forward method to only include convolution operations.
       - Implementing post-processing operations for MTK TFLite.
       - Extending the `DetectMultiBackend` class to handle MTK TFLite models.

Get Source Model
^^^^^^^^^^^^^^^^

**Exporting PyTorch Model to TorchScript Format:**

   .. code-block:: bash

           python export.py --weight yolov5s.pt --img-size 640 640 --include torchscript


Converting Model for Deployment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Quant8 Conversion Process
*************************

1. **Prepare Calibration Data:**

To prepare the calibration data, create a new Python script named `prepare_calibration_data.py` in the root directory of YOLOv5 project. This script will generate a set of images that are used for model quantization calibration.

   .. code-block:: bash

           python prepare_calibration_data.py

   .. code-block:: python

           import os
           import numpy as np
           from utils.dataloaders import LoadImagesAndLabels
           from utils.general import check_dataset

           data = 'data/coco128.yaml'
           num_batches = 100
           calib_dir = 'calibration_dataset'
           os.makedirs(calib_dir)

           # Retrieve first 100 images from training set with batch_size = 1
           dataset = LoadImagesAndLabels(check_dataset(data)['train'], batch_size=1)

           for idx, (im, _target, _path, _shape) in enumerate(dataset):
               if idx >= num_batches:
                   break

               # Expand shape from (3, 640, 640) to (1, 3, 640, 640)
               im = np.expand_dims(im, axis=0).astype(np.float32)
               # 0 - 255 to 0.0 - 1.0
               im /= 255
               np.save(os.path.join(calib_dir, 'batch-{:05d}.npy'.format(idx)), im)

2. **Convert to int8 TFLite:**

To perform the conversion of the PyTorch model to an int8 TFLite format, create a new Python script named `convert_to_quant_tflite.py` in the root directory of your YOLOv5 project. This script will handle the conversion process by utilizing the pre-generated calibration data and converting the model into the quantized TFLite format.

   .. code-block:: bash

           python convert_to_quant_tflite.py

   .. code-block:: python

           import os
           import numpy as np
           import mtk_converter

           calib_dir = 'calibration_dataset'

           converter = mtk_converter.PyTorchConverter.from_script_module_file(
               'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)]
           )

           def data_gen():
               """Return an iterator for the calibration dataset."""
               for fn in sorted(os.listdir(calib_dir)):
                   yield [np.load(os.path.join(calib_dir, fn))]

           converter.quantize = True
           converter.calibration_data_gen = data_gen
           converter.convert_to_tflite('yolov5s_int8_mtk.tflite')

3. **TFLite Model convert to DLA format:**

  1. Download NeuroPilot SDK All-In-One Bundle:

     Visit the download page: `NeuroPilot Downloads <https://neuropilot.mediatek.com/sphinx/neuropilot-6-basic-customer/html/l1_downloads/downloads_external.html#neuropilot-downloads>`_

  2. Extract the Bundle:

     .. code-block:: bash

         tar zxvf neuropilot-sdk-basic-<version>.tar.gz

  3. Setting Environment Variables:

     .. code-block:: bash

         export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib

  4. TFLite Model convert to DLA format:

     .. code-block:: bash

         /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 yolov5s_int8_mtk.tflite

.. note::
   To ensure compatibility with your device, please download and use **NeuroPilot SDK version 6**. Other versions might not be fully supported.


FP32 Conversion Process
***********************

1. **Convert to FP32 TFLite:**

   To convert the PyTorch model to an FP32 TFLite format, create a new Python script named `convert_to_tflite.py` in the root directory of your YOLOv5 project. This script will handle the conversion process to generate a non-quantized, full-precision TFLite model.

   .. code-block:: bash

       python convert_to_tflite.py

   .. code-block:: python

       import mtk_converter

       converter = mtk_converter.PyTorchConverter.from_script_module_file(
           'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)]
       )
       converter.convert_to_tflite('yolov5s_mtk.tflite')


2. **TFLite Model convert to DLA format:**

  1. Setting Environment Variables:

   .. code-block:: bash

       export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib


  2. Convert to DLA format:

   .. code-block:: bash

       /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 yolov5s_mtk.tflite


Model Information
-----------------

.. note::
   The models and benchmark data mentioned below have been processed using the **mtk_converter**.

General Information
^^^^^^^^^^^^^^^^^^^

The information in the table below is sourced from the **Pretrained Checkpoints** section of the `YOLOv5 repository <https://github.com/ultralytics/yolov5>`_.

+-----------------------+--------------------------------------------------------------------------------------------------------------+
| Property              | Value                                                                                                        |
+=======================+==============================================================================================================+
| Category              | Detection                                                                                                    |
+-----------------------+--------------------------------------------------------------------------------------------------------------+
| Input Size            | 640x640                                                                                                      |
+-----------------------+--------------------------------------------------------------------------------------------------------------+
| FLOPs (B)             | 16.5                                                                                                         |
+-----------------------+--------------------------------------------------------------------------------------------------------------+
| #Params (M)           | 7.2                                                                                                          |
+-----------------------+--------------------------------------------------------------------------------------------------------------+
| Training Framework    | PyTorch                                                                                                      |
+-----------------------+--------------------------------------------------------------------------------------------------------------+
| Inference Framework   | TFLite                                                                                                       |
+-----------------------+--------------------------------------------------------------------------------------------------------------+

Pre-converted Model
^^^^^^^^^^^^^^^^^^^

Deployable Model
****************

+-----------------------+---------------------------------------------------------------------------------------------------------------------+------------------------------------------------+
| Model Type            | Download Link                                                                                                       | Supported Backend                              |
+=======================+=====================================================================================================================+================================================+
| Quant8  Model package | `Download Quant8 <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/yolov5s-int8.zip>`_ | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
+-----------------------+---------------------------------------------------------------------------------------------------------------------+------------------------------------------------+
| Float32 Model package | `Download Fp32 <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/yolov5s-fp32.zip>`_   | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
+-----------------------+---------------------------------------------------------------------------------------------------------------------+------------------------------------------------+


Model Properties
****************

- **YOLOv5s-quant8**

**Inputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+-----------------------+------------------------------------------+
| Name                  | x.1                                      |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,3,640,640]                        |
+-----------------------+------------------------------------------+
| Identifier            | 67                                       |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | 0.0039 * (q + 128) ≤ 0.9993              |
+-----------------------+------------------------------------------+

**Outputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+-----------------------+------------------------------------------+
| Name                  | 77                                       |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,255,80,80]                        |
+-----------------------+------------------------------------------+
| Identifier            | 315                                      |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | -19.3298 ≤ 0.0966 * (q - 72) ≤ 5.3157    |
+-----------------------+------------------------------------------+
| Name                  | 78                                       |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,255,40,40]                        |
+-----------------------+------------------------------------------+
| Identifier            | 279                                      |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | -15.8150 ≤ 0.0841 * (q - 60) ≤ 5.6362    |
+-----------------------+------------------------------------------+
| Name                  | 79                                       |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,255,20,20]                        |
+-----------------------+------------------------------------------+
| Identifier            | 15                                       |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | -15.7213 ≤ 0.0845 * (q - 58) ≤ 5.8321    |
+-----------------------+------------------------------------------+

- **YOLOv5s-fp32**

**Inputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+-----------------------+------------------------------------------+
| Name                  | x.1                                      |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,3,640,640]                     |
+-----------------------+------------------------------------------+
| Identifier            | 315                                      |
+-----------------------+------------------------------------------+


**Outputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+-----------------------+------------------------------------------+
| Name                  | 77                                       |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,255,80,80]                     |
+-----------------------+------------------------------------------+
| Identifier            | 304                                      |
+-----------------------+------------------------------------------+
| Name                  | 78                                       |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,255,40,40]                     |
+-----------------------+------------------------------------------+
| Identifier            | 272                                      |
+-----------------------+------------------------------------------+
| Name                  | 79                                       |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,255,20,20]                     |
+-----------------------+------------------------------------------+
| Identifier            | 230                                      |
+-----------------------+------------------------------------------+


Benchmark Results
^^^^^^^^^^^^^^^^^

.. note::
   The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.

   Please note the following limitations:
   
   1. The G350 does not support Neuron Stable Delegate (APU) and APU (MDLA) because the hardware does not yet support these features.
   2. Running models on the G350 using ArmNN inference may result in a crash due to the model size being too large for the platform to handle.

- **YOLOv5s-quant8**

.. csv-table::
   :file: /_asset/tables/Model_Benchmark_tables/int8_YOLOv5s.csv
   :width: 100%
   :widths: 10 15 15 15 15 15 15

- **YOLOv5s-fp32**

.. csv-table::
   :file: /_asset/tables/Model_Benchmark_tables/fp32_YOLOv5s.csv
   :width: 100%
   :widths: 10 15 15 15 15 15 15

Run Benchmark Tools
^^^^^^^^^^^^^^^^^^^

This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.

1. First, push your TFLite model to the target device:

.. code-block:: bash

   adb push <your_tflite_model> /usr/share/label_image/

Make sure to replace `<your_tflite_model>` with the actual path of your TFLite model.

2. Next, open an ADB shell to the target device:

.. code-block:: bash

   adb shell

After this, you can execute the following commands directly from the shell.

Execute on CPU (8 threads)
**************************

To execute the benchmark on the CPU using 8 threads, run the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10

Execute on GPU, with GPU delegate
*********************************

To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --gpu_precision_loss_allowed=1 --num_runs=10

Execute on GPU, with Arm NN delegate
************************************

To execute the benchmark on the GPU using the Arm NN delegate, use the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10

Execute on CPU, with Arm NN delegate
************************************

To run the benchmark on the CPU using the Arm NN delegate, use the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10

Execute on APU, with Neuron Delegate
************************************

For executing on the APU using the Neuron delegate, run the following command:

.. code-block:: bash

   benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>

.. note::
   If you are using the G350 platform, please make the following adjustments:
   
   - For CPU-based benchmarks, change the `num_threads` parameter to 4:
   
     .. code-block:: bash

        benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
   
   - For all benchmarks (CPU, GPU, Arm NN), add the parameter `use_xnnpack=0` to disable the XNNPACK delegate

Neuron SDK
^^^^^^^^^^

Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:

1. **Transfer the Model to the Device:**

   Use `adb` to push your TFLite model to the device:

   .. code-block:: bash

       adb push <your_tflite_model> /user/share/benchmark_dla/

2. **Access the Device Shell:**

   Connect to your device's shell:

   .. code-block:: bash

       adb shell

3. **Navigate to the Benchmark Directory:**

   Change to the directory where the model is stored:

   .. code-block:: bash

       cd /user/share/benchmark_dla/

4. **Run the Benchmark:**

   Execute the benchmarking script with the following command:

   .. code-block:: bash

       python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'

**Description:**

- The `benchmark.py` script runs a performance evaluation on your model using MDLA 3.0.
- The `--file` parameter specifies the path to your TFLite model.
- The `--target mdla3.0` option sets the target hardware to MDLA 3.0.
- The `--profile` flag enables profiling to provide detailed performance metrics.
- The `--options='--relax-fp32'` option allows relaxation of floating-point precision to improve compatibility with MDLA.