YOLOv5s Models

Overview

YOLOv5s is a variant of the YOLO (You Only Look Once) family of object detection models, designed to be a smaller and faster version suitable for real-time object detection tasks. YOLOv5 was developed by Ultralytics and offers improved speed and accuracy compared to previous YOLO versions.

Model Conversion Flow

Precondition

Note

For better compatibility, it is recommended to use Python 3.7 when working with these models, as it has higher compatibility with certain libraries and frameworks.

Before you begin, ensure that the NeuroPilot Converter Tool is installed. If you haven’t installed it yet, please follow the instructions in the “Install and Verify NeuroPilot Converter Tool” section of the same guide.

  1. Clone the repository:

    git clone http://github.com/ultralytics/yolov5
    cd yolov5
    git reset --hard 485da42
    
  2. Install Python packages and dependencies:

    pip3 install -r requirements.txt
    pip3 install torch==1.9.0 torchvision==0.10.0
    

    Note

    The mtk_converter.PyTorchConverter only supports PyTorch versions between 1.3.0 and 2.0.0. The detected version v2.3.1+cu121 is not within this supported range, causing a runtime error. Therefore, it is necessary to install a compatible version of PyTorch and torchvision to ensure compatibility.

  3. Apply Patch:

    git apply Fix_yolov5_mtk_tflite_issue.patch
    

    Note

    The Fix_yolov5_mtk_tflite_issue.patch adds support for MTK TensorFlow Lite (MTK TFLite) in the YOLOv5 model export script. It includes:

    • Adding mtk_tflite as a supported export format.

    • Modifying the Detect module’s forward method to only include convolution operations.

    • Implementing post-processing operations for MTK TFLite.

    • Extending the DetectMultiBackend class to handle MTK TFLite models.

Get Source Model

Exporting Pytorch Model to TorchScript Format:

python export.py --weight yolov5s.pt --img-size 640 640 --include torchscript

Converting Model for Deployment

Quant8 Conversion Process

  1. Prepare Calibration Data:

To prepare the calibration data, create a new Python script named prepare_calibration_data.py in the root directory of YOLOv5 project. This script will generate a set of images that are used for model quantization calibration.

python prepare_calibration_data.py
import os
import numpy as np
from utils.dataloaders import LoadImagesAndLabels
from utils.general import check_dataset

data = 'data/coco128.yaml'
num_batches = 100
calib_dir = 'calibration_dataset'
os.makedirs(calib_dir)

# Retrieve first 100 images from training set with batch_size = 1
dataset = LoadImagesAndLabels(check_dataset(data)['train'], batch_size=1)

for idx, (im, _target, _path, _shape) in enumerate(dataset):
    if idx >= num_batches:
        break

    # Expand shape from (3, 640, 640) to (1, 3, 640, 640)
    im = np.expand_dims(im, axis=0).astype(np.float32)
    # 0 - 255 to 0.0 - 1.0
    im /= 255
    np.save(os.path.join(calib_dir, 'batch-{:05d}.npy'.format(idx)), im)
  1. Convert to int8 TFLite:

To perform the conversion of the Pytorch model to an int8 TFLite format, create a new Python script named convert_to_quant_tflite.py in the root directory of your YOLOv5 project. This script will handle the conversion process by utilizing the pre-generated calibration data and converting the model into the quantized TFlite format.

python convert_to_quant_tflite.py
import os
import numpy as np
import mtk_converter

calib_dir = 'calibration_dataset'

converter = mtk_converter.PyTorchConverter.from_script_module_file(
    'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)]
)

def data_gen():
    """Return an iterator for the calibration dataset."""
    for fn in sorted(os.listdir(calib_dir)):
        yield [np.load(os.path.join(calib_dir, fn))]

converter.quantize = True
converter.calibration_data_gen = data_gen
converter.convert_to_tflite('yolov5s_int8_mtk.tflite')
  1. TFLite Model convert to DLA format:

  1. Download NeuroPilot SDK All-In-One Bundle:

    Visit the download page: NeuroPilot Downloads

  2. Extract the Bundle:

    tar zxvf neuropilot-sdk-basic-<version>.tar.gz
    
  3. Setting Environment Variables:

    export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
    
  4. TFLite Model convert to DLA format:

    /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 yolov5s_int8_mtk.tflite
    

Note

To ensure compatibility with your device, please download and use NeuroPilot SDK version 6. Other versions might not be fully supported.

FP32 Conversion Process

  1. Convert to fp32 TFLite:

    To convert the Pytorch model to an fp32 TFLite format, create a new Python script named convert_to_tflite.py in the root directory of your YOLOv5 project. This script will handle the conversion process to generate a non-quantized, full-precision TFLite model.

    python convert_to_tflite.py
    
    import mtk_converter
    
    converter = mtk_converter.PyTorchConverter.from_script_module_file(
        'yolov5s.torchscript', input_shapes=[(1, 3, 640, 640)]
    )
    converter.convert_to_tflite('yolov5s_mtk.tflite')
    
  2. TFLite Model convert to DLA format:

  1. Setting Environment Variables:

export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
  1. Convert to DLA format:

/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 yolov5s_mtk.tflite

Model Information

Note

The models and benchmark data mentioned below have been processed using the mtk_converter.

General Information

The information in the table below is sourced from the Pretrained Checkpoints section of the YOLOv5 repository.

Property

Value

Category

Detection

Input Size

640x640

FLOPs (B)

16.5

#Params (M)

7.2

Training Framework

PyTorch

Inference Framework

TFLite

Pre-converted Model

Deployable Model

Model Type

Download Link

Supported Backend

Quant8 Model package

Download Quant8

CPU,GPU,ARMNN,Neuron Stable Delegage,NeuronSDK

Float32 Model package

Download Fp32

CPU,GPU,ARMNN,Neuron Stable Delegage,NeuronSDK

Model Properties

  • YOLOv5s-quant8

Inputs

Property

Value

Name

x.1

Tensor

int8[1,3,640,640]

Identifier

67

Quantization

Linear

Quantization Range

0.0039 * (q + 128) ≤ 0.9993

Outputs

Property

Value

Name

77

Tensor

int8[1,255,80,80]

Identifier

315

Quantization

Linear

Quantization Range

-19.3298 ≤ 0.0966 * (q - 72) ≤ 5.3157

Name

78

Tensor

int8[1,255,40,40]

Identifier

279

Quantization

Linear

Quantization Range

-15.8150 ≤ 0.0841 * (q - 60) ≤ 5.6362

Name

79

Tensor

int8[1,255,20,20]

Identifier

15

Quantization

Linear

Quantization Range

-15.7213 ≤ 0.0845 * (q - 58) ≤ 5.8321

  • YOLOv5s-fp32

Inputs

Property

Value

Name

x.1

Tensor

float32[1,3,640,640]

Identifier

315

Outputs

Property

Value

Name

77

Tensor

float32[1,255,80,80]

Identifier

304

Name

78

Tensor

float32[1,255,40,40]

Identifier

272

Name

79

Tensor

float32[1,255,20,20]

Identifier

230

Benchmark Results

Note

The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.

Please note the following limitations:

  1. The G350 does not support Neuron Stable Delegate (APU) and APU (MDLA) because the hardware does not yet support these features.

  2. Running models on the G350 using ARMNN inference may result in a crash due to the model size being too large for the platform to handle.

  • YOLOv5s-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate

NeuronSDK

G350

669.998 ms (Thread:4)

984.989 ms

492.372 ms

456.609 ms

Not Supported

Not Supported

G510

336.39 ms

358.188 ms

161.230 ms

116.290 ms

17.894 ms

17.47 ms

G700

115.887 ms

225.351 ms

113.794 ms

104.801 ms

10.899 ms

10.04 ms

G1200

116.143 ms

150.983 ms

72.639 ms

58.181 ms

19.238 ms

19.05 ms

  • YOLOv5s-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate

NeuronSDK

G350

1379.79 ms (Thread:4)

935.716 ms

957.083 ms

Not Supported

Not Supported

Not Supported

G510

548.035 ms

304.006 ms

302.887 ms

326.755 ms

43.684 ms

46.41 ms

G700

299.257 ms

209.685 ms

207.253 ms

278.701 ms

31.853 ms

32.04 ms

G1200

272.845 ms

136.244 ms

133.026 ms

158.299 ms

36.771 ms

36.66 ms

Run Benchmark Tools

This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.

  1. First, push your TFLite model to the target device:

adb push <your_tflite_model> /usr/share/label_image/

Make sure to replace <your_tflite_model> with the actual path of your TFLite model.

  1. Next, open an ADB shell to the target device:

adb shell

After this, you can execute the following commands directly from the shell.

Execute on CPU (8 threads)

To execute the benchmark on the CPU using 8 threads, run the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10

Execute on GPU, with GPU delegate

To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --num_runs=10

Execute on GPU, with Arm NN delegate

To execute the benchmark on the GPU using the Arm NN delegate, use the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10

Execute on CPU, with Arm NN delegate

To run the benchmark on the CPU using the Arm NN delegate, use the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10

Execute on APU, with Neuron Delegate

For executing on the APU using the Neuron delegate, run the following command:

benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>

Note

If you are using the G350 platform, please make the following adjustments:

  • For CPU-based benchmarks, change the num_threads parameter to 4:

    benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
    
  • For all benchmarks (CPU, GPU, Arm NN), add the parameter use_xnnpack=0 to disable the XNNPACK delegate

Neuron SDK

Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:

  1. Transfer the Model to the Device:

    Use adb to push your TFLite model to the device:

    adb push <your_tflite_model> /user/share/benchmark_dla/
    
  2. Access the Device Shell:

    Connect to your device’s shell:

    adb shell
    
  3. Navigate to the Benchmark Directory:

    Change to the directory where the model is stored:

    cd /user/share/benchmark_dla/
    
  4. Run the Benchmark:

    Execute the benchmarking script with the following command:

    python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'
    

Description:

  • The benchmark.py script runs a performance evaluation on your model using MDLA 3.0.

  • The –file parameter specifies the path to your TFLite model.

  • The –target mdla3.0 option sets the target hardware to MDLA 3.0.

  • The –profile flag enables profiling to provide detailed performance metrics.

  • The –options=’–relax-fp32’ option allows relaxation of floating-point precision to improve compatibility with MDLA.