YOLOX_s Models
==============

Overview
--------

YOLOX is an anchor-free evolution of the YOLO model, designed to offer a streamlined architecture while delivering enhanced performance. It aims to bridge advancements in research with practical applications in the industry.

Model Conversion Flow
---------------------

Precondition
^^^^^^^^^^^^

.. note::
   For better compatibility, it is recommended to use **Python 3.8** when working with these models, as it has higher compatibility with certain libraries and frameworks. 
   Additionally, make sure to use the **pip** version associated with Python 3.8

1. **Clone the YOLOX Repository**

   Start by cloning the YOLOX repository from GitHub:

   .. code-block:: bash

       git clone https://github.com/Megvii-BaseDetection/YOLOX.git
       cd YOLOX

2. **Install Dependencies**

   Install the required dependencies and set up the development environment:

   .. code-block:: bash

       pip install -r requirements.txt
       python3.8 setup.py develop

Get Source Model
^^^^^^^^^^^^^^^^

Follow these steps to set up, download, and convert the YOLOX-S model using PyTorch.

1. **Download the YOLOX-S PyTorch Model:**

   Download the pretrained YOLOX-S model from the following command:

   .. code-block:: bash

       wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth

2. **Export the YOLOX-S Model to TorchScript:**

   Run the export script to convert the YOLOX-S model to TorchScript format:

   .. code-block:: bash

       python3.8 tools/export_torchscript.py -n yolox_s -c yolox_s.pth

Converting Model for Deployment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Before you begin, ensure that the :doc:`NeuroPilot Converter Tool </sw/yocto/ml-guide/neuron-dev-flow/model_converter/neuropilot_converter_tool>` is installed. If you haven't installed it yet, please follow the instructions in the "Install and Verify NeuroPilot Converter Tool" section of the same guide.

Quant8 Conversion Process
*************************

1. **Convert to Quantized TFLite format:**

The following script demonstrates how to convert the YOLOX model to a quantized TFLite format using the NeuroPilot Converter Tool:

- **Data Generation**: A generator function creates random input data, which is used for calibration during the quantization process.
- **Model Loading**: The YOLOX model is loaded from a TorchScript file.
- **Quantization**: The model is set up for quantization, specifying input value ranges and using the generated calibration data.
- **Conversion**: The quantized model is converted to TFLite format and saved as `yolox_s_quant.tflite`.

.. code-block:: bash

    python3.8 convert_tflite_quantize.py

.. code-block:: python

    import mtk_converter
    import numpy as np

    def data_gen():
        for i in range(100):
            yield [np.random.randn(1, 3, 640, 640).astype(np.float32)]

    converter = mtk_converter.PyTorchConverter.from_script_module_file(
        'yolox.torchscript.pt',  [[1, 3, 640, 640]],
    )
    converter.quantize = True
    converter.input_value_ranges = [(-1.0, 1.0)]
    converter.calibration_data_gen = data_gen
    _ = converter.convert_to_tflite(output_file='yolox_s_quant.tflite')

2. **Convert to Quantized DLA format:**

.. warning::
   The process of converting this model to DLA format may encounter unsupported operations. 
   Before converting, you need to prune the model using the provided script to ensure compatibility 
   with the DLA converter.

   Here is an example of how to prune the `yolox_s_quant.tflite` model using the 
   `export_quant_tflite_support_op_6-303_v6.py` script:

   .. code-block:: python

      import mtk_converter

      editor = mtk_converter.TFLiteEditor("yolox_s_quant.tflite")
      output_file = "yolox_s_quant_tflite_6-303_sdkv6.tflite"
      input_names = ["input.32"]  # Specify the input tensor names
      output_names = ["1360"]      # Specify the output tensor names
      _ = editor.export(output_file=output_file, input_names=input_names, output_names=output_names)

.. note::
   The `input_names` and `output_names` specified in this script are based on the example model structure. 
   You need to modify these names according to the input and output tensor names specific to your model. 
   You can inspect the model structure using tools like `Netron` or TensorFlow utilities to identify 
   the correct tensor names for your model.

.. tip::
   For more detailed information and steps on handling unsupported operations in DLA conversion, please see 
   :ref:`Unsupported Operations in DLA Conversion (Optional)`.


- **NeuroPilot SDK tools Download and Convert to DLA**:

  1. **Download the NeuroPilot SDK All-In-One Bundle:**

     Visit the following download page and download the necessary bundle: `NeuroPilot Downloads <https://neuropilot.mediatek.com/sphinx/neuropilot-6-basic-customer/html/l1_downloads/downloads_external.html#neuropilot-downloads>`_

  2. **Extract the Bundle:**

     After downloading, extract the bundle using the following command:

     .. code-block:: bash

        tar zxvf neuropilot-sdk-basic-<version>.tar.gz

  3. **Set the Environment Variables:**

     Set the environment variables to point to the SDK:

     .. code-block:: bash

        export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib

  4. **Convert INT8 TFLite Model to DLA Format:**

     Use the NeuroPilot Converter Tool to convert your TFLite model into the DLA format. The following example shows how to convert an INT8 TFLite model to DLA format using the specified architecture (`mdla3.0`).

     .. code-block:: bash

        /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 yolox_s_quant_tflite_6-303_sdkv6.tflite

.. note::
   To ensure compatibility with your device, please download and use **NeuroPilot SDK version 6**. Other versions might not be fully supported.

FP32 Conversion Process
***********************

1. **Convert to FP32 TFLite Format:**

The following script demonstrates how to convert the YOLOX model to a non-quantized (FP32) TFLite format:

- **Data Generation**: As in the quantization process, a generator function creates random input data for the conversion.
- **Model Loading**: The YOLOX model is loaded from a TorchScript file.
- **Conversion**: The model is converted to TFLite format without applying quantization, and the output is saved as `yolox_s.tflite`.

.. code-block:: bash

    python3.8 convert_tflite.py

.. code-block:: python

    import mtk_converter
    import numpy as np

    def data_gen():
        for i in range(100):
            yield [np.random.randn(1, 3, 640, 640).astype(np.float32)]

    converter = mtk_converter.PyTorchConverter.from_script_module_file(
        'yolox.torchscript.pt',  [[1, 3, 640, 640]],
    )
    converter.input_value_ranges = [(-1.0, 1.0)]
    converter.calibration_data_gen = data_gen
    _ = converter.convert_to_tflite(output_file='yolox_s.tflite')

2. **Convert to FP32 DLA format:**

.. warning::
   The process of converting this model to DLA format may encounter unsupported operations. 
   Before converting, you need to prune the model using the provided script to ensure compatibility 
   with the DLA converter.

   Here is an example of how to prune the `yolox_s.tflite` model using the 
   `export_tflite_support_op_6-303_v6.py` script:

   .. code-block:: python

      import mtk_converter

      editor = mtk_converter.TFLiteEditor("yolox_s.tflite")
      output_file = "yolox_s_tflite_6-303_sdkv6.tflite"
      input_names = ["input.32"]
      output_names = ["1360"]

      _ = editor.export(output_file=output_file, input_names=input_names, output_names=output_names)

.. note::
   The `input_names` and `output_names` specified in this script are based on the example model structure. 
   You need to modify these names according to the input and output tensor names specific to your model. 
   You can inspect the model structure using tools like `Netron` or TensorFlow utilities to identify 
   the correct tensor names for your model.

- **Set the Environment and Convert to DLA:**

  1. **Set the Environment Variables:**

     Set the environment variables to point to the SDK:

     .. code-block:: bash

         export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib

  2. **Convert FP32 TFLite Model to DLA Format:**

     Use the NeuroPilot Converter Tool to convert your FP32 TFLite model into the DLA format. The following example shows how to convert an FP32 TFLite model to DLA format using the specified architecture (`mdla3.0`) and enabling relaxed FP32 operations:

     .. code-block:: bash

         /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 yolox_s_tflite_6-303_sdkv6.tflite


Model Information
-----------------

.. note::
   The models and benchmark data mentioned below have been processed using the **mtk_converter**.

General Information
^^^^^^^^^^^^^^^^^^^

The following table contains general information about the model. The details, such as input size, FLOPS, and number of parameters, are sourced from the documentation at: 
`YOLOX-s Model <https://github.com/Megvii-BaseDetection/YOLOX?tab=readme-ov-file>`_.

+-----------------------+-------------------------------------------------------------------------------------------------------------------+
| Property              | Value        											                    |
+=======================+===================================================================================================================+
| Category              | Classification         											    |
+-----------------------+-------------------------------------------------------------------------------------------------------------------+
| Input Size            | 640x640               										            |
+-----------------------+-------------------------------------------------------------------------------------------------------------------+
| FLOPs (G)             | 26.8                 											            |
+-----------------------+-------------------------------------------------------------------------------------------------------------------+
| #Params (M)           | 9.0                  											            |
+-----------------------+-------------------------------------------------------------------------------------------------------------------+
| Training Framework    | PyTorch         											            |
+-----------------------+-------------------------------------------------------------------------------------------------------------------+
| Inference Framework   | TFLite                 											    |
+-----------------------+-------------------------------------------------------------------------------------------------------------------+


Pre-converted Model
^^^^^^^^^^^^^^^^^^^

Deployable Model
****************

+-----------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+
| Model Type            | Download Link                                                                                                            | Supported Backend                              |
+=======================+==========================================================================================================================+================================================+
| Quant8  Model package | `Download Quant8 <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/yolox_s-int8.zip>`_      | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+
| Float32 Model package | `Download Fp32 <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/yolox_s-fp32_test.zip>`_   | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+


Model Properties
****************

- **YOLOX_s-quant8**

**Inputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+=======================+==========================================+
| Name                  | x.2                                      |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,3,640,640]                        |
+-----------------------+------------------------------------------+
| Identifier            | 318                                      |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    |-1.0039 ≤ 0.0078 * q ≤ 0.9961             |
+-----------------------+------------------------------------------+

**Outputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+=======================+==========================================+
| Name                  | 1162                                     |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,8400,85]                          |
+-----------------------+------------------------------------------+
| Identifier            | 98                                       |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | -2.1799 ≤ 0.0227 * (q + 32) ≤ 3.6106     |
+-----------------------+------------------------------------------+


- **YOLOX_s-fp32**

**Inputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+-----------------------+------------------------------------------+
| Name                  | x.2                                      |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,3,640,640]                     |
+-----------------------+------------------------------------------+
| Identifier            | 111                                      |
+-----------------------+------------------------------------------+


**Outputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+-----------------------+------------------------------------------+
| Name                  | 1360                                     |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,8400,85]                       |
+-----------------------+------------------------------------------+
| Identifier            | 53                                       |
+-----------------------+------------------------------------------+


Benchmark Results
^^^^^^^^^^^^^^^^^

.. note::
   The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.

   Please note the following limitations:
   
   1. The G350 does not support Neuron Stable Delegate (APU) and APU (MDLA) because the hardware does not yet support these features.
   2. Running models on the G350 using ArmNN inference may result in a crash due to the model size being too large for the platform to handle.

- **YOLOX_s-quant8**

.. csv-table::
   :file: /_asset/tables/Model_Benchmark_tables/int8_YOLOX_s.csv
   :width: 100%
   :widths: 10 15 15 15 15 15 15

- **YOLOX_s-fp32**

.. csv-table::
   :file: /_asset/tables/Model_Benchmark_tables/fp32_YOLOX_s.csv
   :width: 100%
   :widths: 10 15 15 15 15 15 15


Run Benchmark Tools
^^^^^^^^^^^^^^^^^^^

This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.

1. First, push your TFLite model to the target device:

.. code-block:: bash

   adb push <your_tflite_model> /usr/share/label_image/

Make sure to replace `<your_tflite_model>` with the actual path of your TFLite model.

2. Next, open an ADB shell to the target device:

.. code-block:: bash

   adb shell

After this, you can execute the following commands directly from the shell.

Execute on CPU (8 threads)
**************************

To execute the benchmark on the CPU using 8 threads, run the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10

Execute on GPU, with GPU delegate
*********************************

To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --gpu_precision_loss_allowed=1 --num_runs=10

Execute on GPU, with Arm NN delegate
************************************

To execute the benchmark on the GPU using the Arm NN delegate, use the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10

Execute on CPU, with Arm NN delegate
************************************

To run the benchmark on the CPU using the Arm NN delegate, use the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10

Execute on APU, with Neuron Delegate
************************************

For executing on the APU using the Neuron delegate, run the following command:

.. code-block:: bash

   benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>

.. note::
   If you are using the G350 platform, please make the following adjustments:
   
   - For CPU-based benchmarks, change the `--num_threads` parameter to 4:
   
     .. code-block:: bash

        benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
   
   - For all benchmarks (CPU, GPU, Arm NN), add the parameter `--use_xnnpack=0` to disable the XNNPACK delegate

Neuron SDK
^^^^^^^^^^

Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:

1. **Transfer the Model to the Device:**

   Use `adb` to push your TFLite model to the device:

   .. code-block:: bash

       adb push <your_tflite_model> /user/share/benchmark_dla/

.. note::
   Make sure to push the **pruned** model (after using the pruning script) to the device to ensure compatibility with the DLA converter. The pruned model should be used instead of the original model for accurate benchmarking.

2. **Access the Device Shell:**

   Connect to your device's shell:

   .. code-block:: bash

       adb shell

3. **Navigate to the Benchmark Directory:**

   Change to the directory where the model is stored:

   .. code-block:: bash

       cd /user/share/benchmark_dla/

4. **Run the Benchmark:**

   Execute the benchmarking script with the following command:

   .. code-block:: bash

       python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'

**Description:**

- The `benchmark.py` script runs a performance evaluation on your model using MDLA 3.0.
- The `--file` parameter specifies the path to your TFLite model.
- The `--target mdla3.0` option sets the target hardware to MDLA 3.0.
- The `--profile` flag enables profiling to provide detailed performance metrics.
- The `--options='--relax-fp32'` option allows relaxation of floating-point precision to improve compatibility with MDLA.