YOLOv8s Models ============== Overview -------- YOLOv8s is a variant of the YOLOv8 (You Only Look Once version 8) family of object detection models, recognized for its advancements in speed, accuracy, and ease of use. Developed by Ultralytics, YOLOv8 represents the latest iteration in the YOLO series, building upon the successes of previous versions such as YOLOv4 and YOLOv5, with a focus on modern deep learning practices and integration with popular frameworks. Model Conversion Flow --------------------- Precondition ^^^^^^^^^^^^ .. note:: For better compatibility, it is recommended to use **Python 3.7** when working with these models, as it has higher compatibility with certain libraries and frameworks. Before you begin, ensure that the :doc:`NeuroPilot Converter Tool ` is installed. If you haven't installed it yet, please follow the instructions in the "Install and Verify NeuroPilot Converter Tool" section of the same guide. 1. **Clone the YOLOv5 repository:** The export script needed for conversion is available in the YOLOv5 repository. Clone it using the following command: .. code-block:: bash git clone https://github.com/ultralytics/yolov5.git cd yolov5 git reset --hard 485da42 2. Install Python packages and dependencies: .. code-block:: bash pip3 install -r requirements.txt pip3 install torch==1.13.0 torchvision==0.12.0 .. note:: The `mtk_converter.PyTorchConverter` only supports PyTorch versions between 1.3.0 and 2.0.0. The detected version v2.3.1+cu121 is not within this supported range, causing a runtime error. Therefore, it is necessary to install a compatible version of PyTorch and `torchvision` to ensure compatibility. Get Source Model ^^^^^^^^^^^^^^^^ 1. **Download the YOLOv8s model:** Use the following `wget` command to download the YOLOv8s model into the YOLOv5 source code directory: .. code-block:: bash wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s.pt 2. **Export the PyTorch model to TorchScript:** Use the following command to convert the model from PyTorch format to TorchScript: .. code-block:: bash python3 export.py --weights yolov8s.pt --img-size 640 640 --include torchscript Converting Model for Deployment ------------------------------- Quant8 Conversion Process ^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Convert to TFLite format: The following script demonstrates how to convert the YOLOv8s model to a quantized TFLite format: - **Data Generation**: A generator function creates random input data for calibration. - **Model Loading**: The YOLOv8s model is loaded from a TorchScript file. - **Quantization**: The model is configured for quantization with specified input value ranges. - **Conversion**: The quantized model is converted to TFLite format and saved. .. code-block:: bash python3 convert_to_tflite_quantized.py .. code-block:: python import mtk_converter import numpy as np def data_gen(): for i in range(100): yield [np.random.randn(1, 3, 640, 640).astype(np.float32)] converter = mtk_converter.PyTorchConverter.from_script_module_file( 'yolov8s.torchscript', [[1, 3, 640, 640]], ) converter.quantize = True converter.input_value_ranges = [(-1.0, 1.0)] converter.calibration_data_gen = data_gen _ = converter.convert_to_tflite(output_file='yolov8s_quant.tflite') .. note:: Use mtk_converter v7.16.0 for best model compatibility with mdla3.0 for execution on G700/G510 2. Convert to DLA format: - **NeuroPilot SDK tools Download**: 1. **Download NeuroPilot SDK All-In-One Bundle:** Visit the download page: `NeuroPilot Downloads `_ 2. **Extract the Bundle:** .. code-block:: bash tar zxvf neuropilot-sdk-basic-.tar.gz 3. **Setting Environment Variables:** .. code-block:: bash export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-/neuron_sdk/host/lib 4. **TFLite Model convert to DLA format:** Use the NeuroPilot Converter Tool to convert your TFLite model into the DLA format. The following example shows how to convert an INT8 TFLite model to DLA format using the specified architecture (`mdla3.0`): .. code-block:: bash /path/to/neuropilot-sdk-basic-/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 yolov8s_quant.tflite .. note:: To ensure compatibility with your device, please download and use **NeuroPilot SDK version 6**. Other versions might not be fully supported. FP32 Conversion Process ^^^^^^^^^^^^^^^^^^^^^^^ 1. Convert to TFLite format: The following script demonstrates how to convert the YOLOv8s model to a non-quantized (FP32) TFLite format: - **Data Generation**: Similar to the quantization process, a generator function creates random input data for conversion. - **Model Loading**: The YOLOv8s model is loaded from a TorchScript file. - **Conversion**: The model is converted to TFLite format without quantization and saved. .. code-block:: bash python3 convert_to_tflite.py .. code-block:: python import mtk_converter import numpy as np def data_gen(): for i in range(100): yield [np.random.randn(1, 3, 640, 640).astype(np.float32)] converter = mtk_converter.PyTorchConverter.from_script_module_file( 'yolov8s.torchscript', [[1, 3, 640, 640]], ) converter.input_value_ranges = [(-1.0, 1.0)] converter.calibration_data_gen = data_gen _ = converter.convert_to_tflite(output_file='yolov8s.tflite') 2. Convert to DLA format: 1. **Setting Environment Variables:** .. code-block:: bash export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-/neuron_sdk/host/lib 2. **TFLite Model convert to DLA format:** Use the NeuroPilot Converter Tool to convert your FP32 TFLite model into the DLA format. The following example shows how to convert an FP32 TFLite model to DLA format using the specified architecture (`mdla3.0`) and enabling relaxed FP32 operations: .. code-block:: bash /path/to/neuropilot-sdk-basic-/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 yolov8s.tflite Model Information ----------------- .. note:: The models and benchmark data mentioned below have been processed using the **mtk_converter**. General Information ^^^^^^^^^^^^^^^^^^^ The information in the table below is sourced from the **Detection** section of the Ultralytics repository, which can be found at `ultralytics repository `_. +-----------------------+--------------------------------------------------------------------------------------------------------------+ | Property | Value | +-----------------------+--------------------------------------------------------------------------------------------------------------+ | Category | Detection | +-----------------------+--------------------------------------------------------------------------------------------------------------+ | Input Size | 640x640 | +-----------------------+--------------------------------------------------------------------------------------------------------------+ | FLOPs (B) | 28.6 | +-----------------------+--------------------------------------------------------------------------------------------------------------+ | #Params (M) | 11.2 | +-----------------------+--------------------------------------------------------------------------------------------------------------+ | Training Framework | PyTorch | +-----------------------+--------------------------------------------------------------------------------------------------------------+ | Inference Framework | TFLite | +-----------------------+--------------------------------------------------------------------------------------------------------------+ Pre-converted Model ^^^^^^^^^^^^^^^^^^^ Deployable Model **************** +-----------------------+---------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ | Model Type | Download Link | Supported Backend | +=======================+=====================================================================================================================+================================================+ | Quant8 Model package | `Download Quant8 `_ | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK | +-----------------------+---------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ | Float32 Model package | `Download Fp32 `_ | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK | +-----------------------+---------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ Model Properties **************** - **YOLOv8s-quant8** **Inputs** +-----------------------+------------------------------------------+ | **Property** | **Value** | +-----------------------+------------------------------------------+ | Name | input.49 | +-----------------------+------------------------------------------+ | Tensor | int8[1,3,640,640] | +-----------------------+------------------------------------------+ | Identifier | 35 | +-----------------------+------------------------------------------+ | Quantization | Linear | +-----------------------+------------------------------------------+ | Quantization Range |-1.0039 ≤ 0.0078 * q ≤ 0.9961 | +-----------------------+------------------------------------------+ **Outputs** +-----------------------+------------------------------------------+ | **Property** | **Value** | +-----------------------+------------------------------------------+ | Name | 80 | +-----------------------+------------------------------------------+ | Tensor | int8[1,84,8400] | +-----------------------+------------------------------------------+ | Identifier | 378 | +-----------------------+------------------------------------------+ | Quantization | Linear | +-----------------------+------------------------------------------+ | Quantization Range | -10.1582 ≤ 2.5395 * (q + 124) ≤ 637.4246 | +-----------------------+------------------------------------------+ | Name | 77 | +-----------------------+------------------------------------------+ | Tensor | int8[1,144,80,80] | +-----------------------+------------------------------------------+ | Identifier | 37 | +-----------------------+------------------------------------------+ | Quantization | Linear | +-----------------------+------------------------------------------+ | Quantization Range | -18.2789 ≤ 0.1115 * (q - 36) ≤ 10.1426 | +-----------------------+------------------------------------------+ | Name | 78 | +-----------------------+------------------------------------------+ | Tensor | int8[1,144,40,40] | +-----------------------+------------------------------------------+ | Identifier | 270 | +-----------------------+------------------------------------------+ | Quantization | Linear | +-----------------------+------------------------------------------+ | Quantization Range | -17.3353 ≤ 0.1008 * (q - 44) ≤ 8.3653 | +-----------------------+------------------------------------------+ | Name | 79 | +-----------------------+------------------------------------------+ | Tensor | int8[1,144,20,20] | +-----------------------+------------------------------------------+ | Identifier | 155 | +-----------------------+------------------------------------------+ | Quantization | Linear | +-----------------------+------------------------------------------+ | Quantization Range | -23.8304 ≤ 0.1288 * (q - 57) ≤ 9.0169 | +-----------------------+------------------------------------------+ - **YOLOv8s-fp32** **Inputs** +-----------------------+------------------------------------------+ | **Property** | **Value** | +-----------------------+------------------------------------------+ | Name | input.49 | +-----------------------+------------------------------------------+ | Tensor | float32[1,3,640,640] | +-----------------------+------------------------------------------+ | Identifier | 145 | +-----------------------+------------------------------------------+ **Outputs** +-----------------------+------------------------------------------+ | **Property** | **Value** | +-----------------------+------------------------------------------+ | Name | 80 | +-----------------------+------------------------------------------+ | Tensor | float32[1,84,8400] | +-----------------------+------------------------------------------+ | Identifier | 78 | +-----------------------+------------------------------------------+ | Name | 77 | +-----------------------+------------------------------------------+ | Tensor | float32[1,144,80,80] | +-----------------------+------------------------------------------+ | Identifier | 235 | +-----------------------+------------------------------------------+ | Name | 78 | +-----------------------+------------------------------------------+ | Tensor | float32[1,144,40,40] | +-----------------------+------------------------------------------+ | Identifier | 73 | +-----------------------+------------------------------------------+ | Name | 79 | +-----------------------+------------------------------------------+ | Tensor | float32[1,144,20,20] | +-----------------------+------------------------------------------+ | Identifier | 343 | +-----------------------+------------------------------------------+ Benchmark Results ^^^^^^^^^^^^^^^^^ .. note:: The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used. Please note the following limitations: 1. The G350 does not support Neuron Stable Delegate (APU) and APU (MDLA) because the hardware does not yet support these features. 2. Running models on the G350 using ArmNN inference may result in a crash due to the model size being too large for the platform to handle. - **YOLOv8s-quant8** .. csv-table:: :file: /_asset/tables/Model_Benchmark_tables/int8_YOLOv8s.csv :width: 100% :widths: 10 15 15 15 15 15 15 - **YOLOv8s-fp32** .. csv-table:: :file: /_asset/tables/Model_Benchmark_tables/fp32_YOLOv8s.csv :width: 100% :widths: 10 15 15 15 15 15 15 Run Benchmark Tools ^^^^^^^^^^^^^^^^^^^ This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations. 1. First, push your TFLite model to the target device: .. code-block:: bash adb push /usr/share/label_image/ Make sure to replace `` with the actual path of your TFLite model. 2. Next, open an ADB shell to the target device: .. code-block:: bash adb shell After this, you can execute the following commands directly from the shell. Execute on CPU (8 threads) ************************** To execute the benchmark on the CPU using 8 threads, run the following command: .. code-block:: bash benchmark_model --graph=/usr/share/label_image/ --num_threads=8 --num_runs=10 Execute on GPU, with GPU delegate ********************************* To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command: .. code-block:: bash benchmark_model --graph=/usr/share/label_image/ --use_gpu=1 --gpu_precision_loss_allowed=1 --num_runs=10 Execute on GPU, with Arm NN delegate ************************************ To execute the benchmark on the GPU using the Arm NN delegate, use the following command: .. code-block:: bash benchmark_model --graph=/usr/share/label_image/ --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10 Execute on CPU, with Arm NN delegate ************************************ To run the benchmark on the CPU using the Arm NN delegate, use the following command: .. code-block:: bash benchmark_model --graph=/usr/share/label_image/ --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10 Execute on APU, with Neuron Delegate ************************************ For executing on the APU using the Neuron delegate, run the following command: .. code-block:: bash benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/ .. note:: If you are using the G350 platform, please make the following adjustments: - For CPU-based benchmarks, change the `--num_threads` parameter to 4: .. code-block:: bash benchmark_model --graph=/usr/share/label_image/ --num_threads=4 --use_xnnpack=0 --num_runs=10 - For all benchmarks (CPU, GPU, Arm NN), add the parameter `--use_xnnpack=0` to disable the XNNPACK delegate Neuron SDK ^^^^^^^^^^ Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0: 1. **Transfer the Model to the Device:** Use `adb` to push your TFLite model to the device: .. code-block:: bash adb push /user/share/benchmark_dla/ 2. **Access the Device Shell:** Connect to your device's shell: .. code-block:: bash adb shell 3. **Navigate to the Benchmark Directory:** Change to the directory where the model is stored: .. code-block:: bash cd /user/share/benchmark_dla/ 4. **Run the Benchmark:** Execute the benchmarking script with the following command: .. code-block:: bash python3 benchmark.py --file --target mdla3.0 --profile --options='--relax-fp32' **Description:** - The `benchmark.py` script runs a performance evaluation on your model using MDLA 3.0. - The `--file` parameter specifies the path to your TFLite model. - The `--target mdla3.0` option sets the target hardware to MDLA 3.0. - The `--profile` flag enables profiling to provide detailed performance metrics. - The `--options='--relax-fp32'` option allows relaxation of floating-point precision to improve compatibility with MDLA.