YOLOv5s Models
Overview
YOLOv5s is a variant of the YOLO (You Only Look Once) family of object detection models, designed to be a smaller and faster version suitable for real-time object detection tasks. YOLOv5 was developed by Ultralytics and offers improved speed and accuracy compared to previous YOLO versions.
Model Conversion Flow
Precondition
Note
For better compatibility, it is recommended to use Python 3.7 when working with these models, as it has higher compatibility with certain libraries and frameworks.
Before you begin, ensure that the NeuroPilot Converter Tool is installed. If you haven’t installed it yet, please follow the instructions in the “Install and Verify NeuroPilot Converter Tool” section of the same guide.
Clone the repository:
git clone http://github.com/ultralytics/yolov5 cd yolov5 git reset --hard 485da42
Install Python packages and dependencies:
pip3 install -r requirements.txt pip3 install torch==1.9.0 torchvision==0.10.0
Note
The mtk_converter.PyTorchConverter only supports PyTorch versions between 1.3.0 and 2.0.0. The detected version v2.3.1+cu121 is not within this supported range, causing a runtime error. Therefore, it is necessary to install a compatible version of PyTorch and
torchvision
to ensure compatibility.Apply Patch:
git apply Fix_yolov5_mtk_tflite_issue.patch
Note
The Fix_yolov5_mtk_tflite_issue.patch adds support for MTK TensorFlow Lite (MTK TFLite) in the YOLOv5 model export script. It includes:
Adding mtk_tflite as a supported export format.
Modifying the Detect module’s forward method to only include convolution operations.
Implementing post-processing operations for MTK TFLite.
Extending the DetectMultiBackend class to handle MTK TFLite models.
Get Source Model
Exporting PyTorch Model to TorchScript Format:
python export.py --weight yolov5s.pt --img-size 640 640 --include torchscript
Converting Model for Deployment
Quant8 Conversion Process
Prepare Calibration Data:
To prepare the calibration data, create a new Python script named prepare_calibration_data.py in the root directory of YOLOv5 project. This script will generate a set of images that are used for model quantization calibration.
python prepare_calibration_data.py
import os import numpy as np from utils.dataloaders import LoadImagesAndLabels from utils.general import check_dataset data = 'data/coco128.yaml' num_batches = 100 calib_dir = 'calibration_dataset' os.makedirs(calib_dir) # Retrieve first 100 images from training set with batch_size = 1 dataset = LoadImagesAndLabels(check_dataset(data)['train'], batch_size=1) for idx, (im, _target, _path, _shape) in enumerate(dataset): if idx >= num_batches: break # Expand shape from (3, 640, 640) to (1, 3, 640, 640) im = np.expand_dims(im, axis=0).astype(np.float32) # 0 - 255 to 0.0 - 1.0 im /= 255 np.save(os.path.join(calib_dir, 'batch-{:05d}.npy'.format(idx)), im)
Convert to int8 TFLite:
To perform the conversion of the PyTorch model to an int8 TFLite format, create a new Python script named convert_to_quant_tflite.py in the root directory of your YOLOv5 project. This script will handle the conversion process by utilizing the pre-generated calibration data and converting the model into the quantized TFLite format.
python convert_to_quant_tflite.py
import os import numpy as np import mtk_converter calib_dir = 'calibration_dataset' converter = mtk_converter.PyTorchConverter.from_script_module_file( 'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)] ) def data_gen(): """Return an iterator for the calibration dataset.""" for fn in sorted(os.listdir(calib_dir)): yield [np.load(os.path.join(calib_dir, fn))] converter.quantize = True converter.calibration_data_gen = data_gen converter.convert_to_tflite('yolov5s_int8_mtk.tflite')
TFLite Model convert to DLA format:
Download NeuroPilot SDK All-In-One Bundle:
Visit the download page: NeuroPilot Downloads
Extract the Bundle:
tar zxvf neuropilot-sdk-basic-<version>.tar.gzSetting Environment Variables:
export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/libTFLite Model convert to DLA format:
/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 yolov5s_int8_mtk.tflite
Note
To ensure compatibility with your device, please download and use NeuroPilot SDK version 6. Other versions might not be fully supported.
FP32 Conversion Process
Convert to FP32 TFLite:
To convert the PyTorch model to an FP32 TFLite format, create a new Python script named convert_to_tflite.py in the root directory of your YOLOv5 project. This script will handle the conversion process to generate a non-quantized, full-precision TFLite model.
python convert_to_tflite.py
import mtk_converter converter = mtk_converter.PyTorchConverter.from_script_module_file( 'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)] ) converter.convert_to_tflite('yolov5s_mtk.tflite')
TFLite Model convert to DLA format:
Setting Environment Variables:
export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
Convert to DLA format:
/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 yolov5s_mtk.tflite
Model Information
Note
The models and benchmark data mentioned below have been processed using the mtk_converter.
General Information
The information in the table below is sourced from the Pretrained Checkpoints section of the YOLOv5 repository.
Property |
Value |
---|---|
Category |
Detection |
Input Size |
640x640 |
FLOPs (B) |
16.5 |
#Params (M) |
7.2 |
Training Framework |
PyTorch |
Inference Framework |
TFLite |
Pre-converted Model
Deployable Model
Model Type |
Download Link |
Supported Backend |
---|---|---|
Quant8 Model package |
CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
|
Float32 Model package |
CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
Model Properties
YOLOv5s-quant8
Inputs
Property |
Value |
Name |
x.1 |
Tensor |
int8[1,3,640,640] |
Identifier |
67 |
Quantization |
Linear |
Quantization Range |
0.0039 * (q + 128) ≤ 0.9993 |
Outputs
Property |
Value |
Name |
77 |
Tensor |
int8[1,255,80,80] |
Identifier |
315 |
Quantization |
Linear |
Quantization Range |
-19.3298 ≤ 0.0966 * (q - 72) ≤ 5.3157 |
Name |
78 |
Tensor |
int8[1,255,40,40] |
Identifier |
279 |
Quantization |
Linear |
Quantization Range |
-15.8150 ≤ 0.0841 * (q - 60) ≤ 5.6362 |
Name |
79 |
Tensor |
int8[1,255,20,20] |
Identifier |
15 |
Quantization |
Linear |
Quantization Range |
-15.7213 ≤ 0.0845 * (q - 58) ≤ 5.8321 |
YOLOv5s-fp32
Inputs
Property |
Value |
Name |
x.1 |
Tensor |
float32[1,3,640,640] |
Identifier |
315 |
Outputs
Property |
Value |
Name |
77 |
Tensor |
float32[1,255,80,80] |
Identifier |
304 |
Name |
78 |
Tensor |
float32[1,255,40,40] |
Identifier |
272 |
Name |
79 |
Tensor |
float32[1,255,20,20] |
Identifier |
230 |
Benchmark Results
Note
The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.
Please note the following limitations:
The G350 does not support Neuron Stable Delegate (APU) and APU (MDLA) because the hardware does not yet support these features.
Running models on the G350 using ArmNN inference may result in a crash due to the model size being too large for the platform to handle.
YOLOv5s-quant8
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate |
NeuronSDK |
G350 |
669.998 ms (Thread:4) |
984.989 ms |
492.372 ms |
456.609 ms |
Not Supported |
Not Supported |
G510 |
336.39 ms |
358.188 ms |
161.230 ms |
116.290 ms |
17.894 ms |
17.47 ms |
G700 |
115.887 ms |
225.351 ms |
113.794 ms |
104.801 ms |
10.899 ms |
10.04 ms |
G1200 |
116.143 ms |
150.983 ms |
72.639 ms |
58.181 ms |
19.238 ms |
19.05 ms |
YOLOv5s-fp32
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate |
NeuronSDK |
G350 |
1379.79 ms (Thread:4) |
935.716 ms |
957.083 ms |
Not Supported |
Not Supported |
Not Supported |
G510 |
548.035 ms |
304.006 ms |
302.887 ms |
326.755 ms |
43.684 ms |
46.41 ms |
G700 |
299.257 ms |
209.685 ms |
207.253 ms |
278.701 ms |
31.853 ms |
32.04 ms |
G1200 |
272.845 ms |
136.244 ms |
133.026 ms |
158.299 ms |
36.771 ms |
36.66 ms |
Run Benchmark Tools
This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.
First, push your TFLite model to the target device:
adb push <your_tflite_model> /usr/share/label_image/
Make sure to replace <your_tflite_model> with the actual path of your TFLite model.
Next, open an ADB shell to the target device:
adb shell
After this, you can execute the following commands directly from the shell.
Execute on CPU (8 threads)
To execute the benchmark on the CPU using 8 threads, run the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10
Execute on GPU, with GPU delegate
To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --num_runs=10
Execute on GPU, with Arm NN delegate
To execute the benchmark on the GPU using the Arm NN delegate, use the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10
Execute on CPU, with Arm NN delegate
To run the benchmark on the CPU using the Arm NN delegate, use the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10
Execute on APU, with Neuron Delegate
For executing on the APU using the Neuron delegate, run the following command:
benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>
Note
If you are using the G350 platform, please make the following adjustments:
For CPU-based benchmarks, change the num_threads parameter to 4:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
For all benchmarks (CPU, GPU, Arm NN), add the parameter use_xnnpack=0 to disable the XNNPACK delegate
Neuron SDK
Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:
Transfer the Model to the Device:
Use adb to push your TFLite model to the device:
adb push <your_tflite_model> /user/share/benchmark_dla/
Access the Device Shell:
Connect to your device’s shell:
adb shell
Navigate to the Benchmark Directory:
Change to the directory where the model is stored:
cd /user/share/benchmark_dla/
Run the Benchmark:
Execute the benchmarking script with the following command:
python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'
Description:
The benchmark.py script runs a performance evaluation on your model using MDLA 3.0.
The –file parameter specifies the path to your TFLite model.
The –target mdla3.0 option sets the target hardware to MDLA 3.0.
The –profile flag enables profiling to provide detailed performance metrics.
The –options=’–relax-fp32’ option allows relaxation of floating-point precision to improve compatibility with MDLA.