YOLOv5s Models

Overview

YOLOv5s is a variant of the YOLO (You Only Look Once) family of object detection models, designed to be a smaller and faster version suitable for real-time object detection tasks. YOLOv5 was developed by Ultralytics and offers improved speed and accuracy compared to previous YOLO versions.

Model Conversion Flow

Precondition

Note

For better compatibility, it is recommended to use Python 3.7 when working with these models, as it has higher compatibility with certain libraries and frameworks.

Before you begin, ensure that the NeuroPilot Converter Tool is installed. If you haven’t installed it yet, please follow the instructions in the “Install and Verify NeuroPilot Converter Tool” section of the same guide.

Clone the repository:

git clone http://github.com/ultralytics/yolov5
cd yolov5
git reset --hard 485da42

Install Python packages and dependencies:
```
pip3 install -r requirements.txt
pip3 install torch==1.9.0 torchvision==0.10.0
```
Note

The mtk_converter.PyTorchConverter only supports PyTorch versions between 1.3.0 and 2.0.0. The detected version v2.3.1+cu121 is not within this supported range, causing a runtime error. Therefore, it is necessary to install a compatible version of PyTorch and torchvision to ensure compatibility.
Apply Patch:
```
git apply Fix_yolov5_mtk_tflite_issue.patch
```
Note

The Fix_yolov5_mtk_tflite_issue.patch adds support for MTK TensorFlow Lite (MTK TFLite) in the YOLOv5 model export script. It includes:
- Adding mtk_tflite as a supported export format.
- Modifying the Detect module’s forward method to only include convolution operations.
- Implementing post-processing operations for MTK TFLite.
- Extending the DetectMultiBackend class to handle MTK TFLite models.

Get Source Model

Exporting PyTorch Model to TorchScript Format:

python export.py --weight yolov5s.pt --img-size 640 640 --include torchscript

Converting Model for Deployment

Quant8 Conversion Process

Prepare Calibration Data:

To prepare the calibration data, create a new Python script named prepare_calibration_data.py in the root directory of YOLOv5 project. This script will generate a set of images that are used for model quantization calibration.

python prepare_calibration_data.py

import os
import numpy as np
from utils.dataloaders import LoadImagesAndLabels
from utils.general import check_dataset

data = 'data/coco128.yaml'
num_batches = 100
calib_dir = 'calibration_dataset'
os.makedirs(calib_dir)

# Retrieve first 100 images from training set with batch_size = 1
dataset = LoadImagesAndLabels(check_dataset(data)['train'], batch_size=1)

for idx, (im, _target, _path, _shape) in enumerate(dataset):
    if idx >= num_batches:
        break

    # Expand shape from (3, 640, 640) to (1, 3, 640, 640)
    im = np.expand_dims(im, axis=0).astype(np.float32)
    # 0 - 255 to 0.0 - 1.0
    im /= 255
    np.save(os.path.join(calib_dir, 'batch-{:05d}.npy'.format(idx)), im)

Convert to int8 TFLite:

To perform the conversion of the PyTorch model to an int8 TFLite format, create a new Python script named convert_to_quant_tflite.py in the root directory of your YOLOv5 project. This script will handle the conversion process by utilizing the pre-generated calibration data and converting the model into the quantized TFLite format.

python convert_to_quant_tflite.py

import os
import numpy as np
import mtk_converter

calib_dir = 'calibration_dataset'

converter = mtk_converter.PyTorchConverter.from_script_module_file(
    'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)]
)

def data_gen():
    """Return an iterator for the calibration dataset."""
    for fn in sorted(os.listdir(calib_dir)):
        yield [np.load(os.path.join(calib_dir, fn))]

converter.quantize = True
converter.calibration_data_gen = data_gen
converter.convert_to_tflite('yolov5s_int8_mtk.tflite')

TFLite Model convert to DLA format:

Download NeuroPilot SDK All-In-One Bundle:

Visit the download page: NeuroPilot Downloads
Extract the Bundle:
tar zxvf neuropilot-sdk-basic-<version>.tar.gz
Setting Environment Variables:
export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
TFLite Model convert to DLA format:
/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 yolov5s_int8_mtk.tflite

Note

To ensure compatibility with your device, please download and use NeuroPilot SDK version 6. Other versions might not be fully supported.

FP32 Conversion Process

Convert to FP32 TFLite:

To convert the PyTorch model to an FP32 TFLite format, create a new Python script named convert_to_tflite.py in the root directory of your YOLOv5 project. This script will handle the conversion process to generate a non-quantized, full-precision TFLite model.
```
python convert_to_tflite.py
```
```
import mtk_converter

converter = mtk_converter.PyTorchConverter.from_script_module_file(
    'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)]
)
converter.convert_to_tflite('yolov5s_mtk.tflite')
```
TFLite Model convert to DLA format:

Setting Environment Variables:

export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib

Convert to DLA format:

/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 yolov5s_mtk.tflite

Model Information

Note

The models and benchmark data mentioned below have been processed using the mtk_converter.

General Information

The information in the table below is sourced from the Pretrained Checkpoints section of the YOLOv5 repository.

Property	Value
Category	Detection
Input Size	640x640
FLOPs (B)	16.5
#Params (M)	7.2
Training Framework	PyTorch
Inference Framework	TFLite

Pre-converted Model

Deployable Model

Model Type	Download Link	Supported Backend
Quant8 Model package	Download Quant8	CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK
Float32 Model package	Download Fp32	CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK

Model Properties

YOLOv5s-quant8

Inputs

Property	Value
Name	x.1
Tensor	int8[1,3,640,640]
Identifier	67
Quantization	Linear
Quantization Range	0.0039 * (q + 128) ≤ 0.9993

Outputs

Property	Value
Name	77
Tensor	int8[1,255,80,80]
Identifier	315
Quantization	Linear
Quantization Range	-19.3298 ≤ 0.0966 * (q - 72) ≤ 5.3157
Name	78
Tensor	int8[1,255,40,40]
Identifier	279
Quantization	Linear
Quantization Range	-15.8150 ≤ 0.0841 * (q - 60) ≤ 5.6362
Name	79
Tensor	int8[1,255,20,20]
Identifier	15
Quantization	Linear
Quantization Range	-15.7213 ≤ 0.0845 * (q - 58) ≤ 5.8321

YOLOv5s-fp32

Inputs

Property	Value
Name	x.1
Tensor	float32[1,3,640,640]
Identifier	315

Outputs

Property	Value
Name	77
Tensor	float32[1,255,80,80]
Identifier	304
Name	78
Tensor	float32[1,255,40,40]
Identifier	272
Name	79
Tensor	float32[1,255,20,20]
Identifier	230

Benchmark Results

Note

The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.

Please note the following limitations:

The G350 does not support Neuron Stable Delegate (APU) and APU (MDLA) because the hardware does not yet support these features.
Running models on the G350 using ArmNN inference may result in a crash due to the model size being too large for the platform to handle.

YOLOv5s-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate	NeuronSDK
G350	669.998 ms (Thread:4)	984.989 ms	492.372 ms	456.609 ms	Not Supported	Not Supported
G510	336.39 ms	358.188 ms	161.230 ms	116.290 ms	17.894 ms	17.47 ms
G700	115.887 ms	225.351 ms	113.794 ms	104.801 ms	10.899 ms	10.04 ms
G1200	116.143 ms	150.983 ms	72.639 ms	58.181 ms	19.238 ms	19.05 ms

YOLOv5s-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate	NeuronSDK
G350	1379.79 ms (Thread:4)	935.716 ms	957.083 ms	Not Supported	Not Supported	Not Supported
G510	548.035 ms	304.006 ms	302.887 ms	326.755 ms	43.684 ms	46.41 ms
G700	299.257 ms	209.685 ms	207.253 ms	278.701 ms	31.853 ms	32.04 ms
G1200	272.845 ms	136.244 ms	133.026 ms	158.299 ms	36.771 ms	36.66 ms

Run Benchmark Tools

This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.

First, push your TFLite model to the target device:

adb push <your_tflite_model> /usr/share/label_image/

Make sure to replace <your_tflite_model> with the actual path of your TFLite model.

Next, open an ADB shell to the target device:

adb shell

After this, you can execute the following commands directly from the shell.

Execute on CPU (8 threads)

To execute the benchmark on the CPU using 8 threads, run the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10

Execute on GPU, with GPU delegate

To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --gpu_precision_loss_allowed=1 --num_runs=10

Execute on GPU, with Arm NN delegate

To execute the benchmark on the GPU using the Arm NN delegate, use the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10

Execute on CPU, with Arm NN delegate

To run the benchmark on the CPU using the Arm NN delegate, use the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10

Execute on APU, with Neuron Delegate

For executing on the APU using the Neuron delegate, run the following command:

benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>

Note

If you are using the G350 platform, please make the following adjustments:

For CPU-based benchmarks, change the num_threads parameter to 4:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10

For all benchmarks (CPU, GPU, Arm NN), add the parameter use_xnnpack=0 to disable the XNNPACK delegate

Neuron SDK

Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:

Transfer the Model to the Device:

Use adb to push your TFLite model to the device:
```
adb push <your_tflite_model> /user/share/benchmark_dla/
```
Access the Device Shell:

Connect to your device’s shell:
```
adb shell
```
Navigate to the Benchmark Directory:

Change to the directory where the model is stored:
```
cd /user/share/benchmark_dla/
```

Run the Benchmark:

Execute the benchmarking script with the following command:

python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'

Description:

The benchmark.py script runs a performance evaluation on your model using MDLA 3.0.
The --file parameter specifies the path to your TFLite model.
The --target mdla3.0 option sets the target hardware to MDLA 3.0.
The --profile flag enables profiling to provide detailed performance metrics.
The --options='--relax-fp32' option allows relaxation of floating-point precision to improve compatibility with MDLA.