ShuffleNetV2 Models

Overview

ShuffleNet V2 is a deep learning model optimized for speed and efficiency, rather than just computational complexity. It is designed based on practical guidelines that consider factors like memory access cost and platform characteristics, achieving a state-of-the-art balance between speed and accuracy, making it ideal for resource-constrained environments.

Getting Started

Follow these steps to use and convert ShuffleNetV2 models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert ShuffleNetV2 Model:

Load a pretrained ShuffleNetV2 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.shufflenet_v2_x2_0(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'shufflenet_v2_x2_0.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=shufflenet_v2_x2_0.pt  \
    --output_file=shufflenet_v2_x2_0_ptq_quant.tflite \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy           \
    --allow_incompatible_paddings_for_tflite_pooling=True

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=shufflenet_v2_x2_0.pt  \
    --output_file=shufflenet_v2_x2_0.tflite           \
    --input_shapes=1,3,224,224                        \
    --allow_incompatible_paddings_for_tflite_pooling=True

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	0.58
#Params (M)	7.39
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.3
Tensor	int8[1,3,224,224]
Identifier	242
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	1166
Tensor	int8[1,1000]
Identifier	138
Quantization	Linear
Quantization Range	-1.9862 ≤ 0.0296 * (q + 61) ≤ 5.5732

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.3
Tensor	float32[1,3,224,224]
Identifier	48

Outputs

Property	Value
Name	1166
Tensor	float32[1,1000]
Identifier	99

Performance Benchmarks

ShuffleNetV2-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G700	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G1200	N/A	N/A	N/A	N/A	N/A	N/A	N/A

ShuffleNetV2-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	111.892 ms (Thread:4)	254.189 ms	161.390 ms	118.324 ms	N/A	N/A	587.344 ms
G510	152.124 ms	54.533 ms	58.467 ms	39.693 ms	16.329 ms	N/A	N/A
G700	20.311 ms	48.278 ms	42.629 ms	35.921 ms	12.475 ms	N/A	N/A
G1200	18.917 ms	47.996 ms	32.253 ms	25.790 ms	23.315 ms	N/A	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github