ShuffleNetV2 Models
Overview
ShuffleNet V2 is a deep learning model optimized for speed and efficiency, rather than just computational complexity. It is designed based on practical guidelines that consider factors like memory access cost and platform characteristics, achieving a state-of-the-art balance between speed and accuracy, making it ideal for resource-constrained environments.
Model Conversion Flow
Precondition
Note
For better compatibility, it is recommended to use Python 3.7 when working with these models, as it has higher compatibility with certain libraries and frameworks.
Install Required Libraries:
Ensure you have the necessary libraries installed:
pip install torch torchvision
Get Source Model
Follow these steps to use and convert ShuffleNetV2 models using PyTorch and TorchVision.
Load and Convert ShuffleNetV2 Model:
Load a pretrained ShuffleNetV2 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.
Run the following Python script:
python generate_shufflenetv2_float.py
import torch import torchvision model = torchvision.models.shufflenet_v2_x2_0(pretrained=True) trace_data = torch.randn(1, 3, 224, 224) trace_model = torch.jit.trace(model.cpu().eval(), trace_data) torch.jit.save(trace_model, 'shufflenet_v2_x2_0.pt')
Converting Model for Deployment
Before you begin, ensure that the NeuroPilot Converter Tool is installed. If you haven’t installed it yet, please follow the instructions in the “Install and Verify NeuroPilot Converter Tool” section of the same guide.
Quant8 Conversion Process
Generate Calibration Data:
The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
python generate_data_batches.py
import os import numpy as np os.mkdir('data') for i in range(100): data = np.random.randn(1, 3, 224, 224).astype(np.float32) np.save('data/batch_{}.npy'.format(i), data)
Convert to Quantized TFLite Format:
Use the following command to convert the model to a quantized TFLite format using the generated calibration data:
mtk_pytorch_converter \ --input_script_module_file=shufflenet_v2_x2_0.pt \ --output_file=shufflenet_v2_x2_0_ptq_quant.tflite \ --input_shapes=1,3,224,224 \ --quantize=True \ --input_value_ranges=-1,1 \ --calibration_data_dir=data/ \ --calibration_data_regexp=batch_.*\.npy \ --allow_incompatible_paddings_for_tflite_pooling=True
Convert to Quantized DLA Format
Warning
The process of converting this model to DLA format may encounter unsupported operations. Before converting, you need to prune the model using the provided script to ensure compatibility with the DLA converter. Here is an example of how to prune the shufflenet_v2_x2_0_ptq_quant.tflite model using the export_quant_0-15.py script:
import mtk_converter
editor = mtk_converter.TFLiteEditor("shufflenet_v2_x2_0_ptq_quant.tflite")
output_file = "shufflenet_v2_x2_0_ptq_quant_0-15.tflite"
input_names = ["x.3"]
output_names = ["x0.3"]
_ = editor.export(output_file=output_file, input_names=input_names, output_names=output_names)
Note
The input_names and output_names specified in this script are based on the example model structure. You need to modify these names according to the input and output tensor names specific to your model. You can inspect the model structure using tools like Netron or TensorFlow utilities to identify the correct tensor names for your model.
Tip
For more detailed information and steps on handling unsupported operations in DLA conversion, please see Unsupported Operations in DLA Conversion (Optional).
NeuroPilot SDK tools Download and Convert to DLA:
Download the NeuroPilot SDK All-In-One Bundle:
Visit the following download page and download the necessary bundle: NeuroPilot Downloads
Extract the Bundle:
After downloading, extract the bundle using the following command:
tar zxvf neuropilot-sdk-basic-<version>.tar.gz
Set the Environment Variables:
Set the environment variables to point to the SDK:
export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
Convert INT8 TFLite Model to DLA Format:
Use the NeuroPilot Converter Tool to convert your TFLite model into the DLA format. The following example shows how to convert an INT8 TFLite model to DLA format using the specified architecture (mdla3.0).
/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 shufflenet_v2_x2_0_ptq_quant_0-15.tflite
Note
To ensure compatibility with your device, please download and use NeuroPilot SDK version 6. Other versions might not be fully supported.
FP32 Conversion Process
Convert to FP32 TFLite Format:
To convert the model to a non-quantized (FP32) TFLite format, use the following command:
mtk_pytorch_converter \
--input_script_module_file=shufflenet_v2_x2_0.pt \
--output_file=shufflenet_v2_x2_0.tflite \
--input_shapes=1,3,224,224 \
--allow_incompatible_paddings_for_tflite_pooling=True
Convert to FP32 DLA Format
Warning
The process of converting this model to DLA format may encounter unsupported operations. Before converting, you need to prune the model using the provided script to ensure compatibility with the DLA converter.
Here is an example of how to prune the shufflenet_v2_x2_0.tflite model using the export_0-15.py script:
import mtk_converter
editor = mtk_converter.TFLiteEditor("shufflenet_v2_x2_0.tflite")
output_file = "shufflenet_v2_x2_0_0-15.tflite"
input_names = ["x.3"]
output_names = ["x0.3"]
_ = editor.export(output_file=output_file, input_names=input_names, output_names=output_names)
Note
The input_names and output_names specified in this script are based on the example model structure. You need to modify these names according to the input and output tensor names specific to your model. You can inspect the model structure using tools like Netron or TensorFlow utilities to identify the correct tensor names for your model.
Set the Environment and Convert to DLA:
Set the Environment Variables:
Set the environment variables to point to the SDK:
export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
Convert FP32 TFLite Model to DLA Format:
Use the NeuroPilot Converter Tool to convert your FP32 TFLite model into the DLA format. The following example shows how to convert an FP32 TFLite model to DLA format using the specified architecture (mdla3.0) and enabling relaxed FP32 operations:
/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 shufflenet_v2_x2_0_0-15.tflite
Model Information
Note
The models and benchmark data mentioned below have been processed using the mtk_converter.
General Information
The following table contains general information about the model. The details, such as input size, GFLOPS, and number of parameters, are sourced from the official PyTorch documentation at: ShuffleNet_V2_X2_0.
Property |
Value |
---|---|
Category |
Classification |
Input Size |
224x224 |
GFLOPS |
0.58 |
#Params (M) |
7.39 |
Training Framework |
PyTorch |
Inference Framework |
TFLite |
Pre-converted Model
Deployable Model
Model Type |
Download Link |
Supported Backend |
---|---|---|
Quant8 Model package |
NeuronSDK |
|
Float32 Model package |
CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK |
Model Properties
ShuffleNetV2-quant8
Inputs
Property |
Value |
---|---|
Name |
x.3 |
Tensor |
int8[1,3,224,224] |
Identifier |
242 |
Quantization |
Linear |
Quantization Range |
-1.0039 ≤ 0.0078 * q ≤ 0.9961 |
Outputs
Property |
Value |
---|---|
Name |
1166 |
Tensor |
int8[1,1000] |
Identifier |
138 |
Quantization |
Linear |
Quantization Range |
-1.9862 ≤ 0.0296 * (q + 61) ≤ 5.5732 |
ShuffleNetV2-fp32
Inputs
Property |
Value |
---|---|
Name |
x.3 |
Tensor |
float32[1,3,224,224] |
Identifier |
48 |
Outputs
Property |
Value |
---|---|
Name |
1166 |
Tensor |
float32[1,1000] |
Identifier |
99 |
Benchmark Results
Note
The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.
Please note the following limitations:
The G350 does not support Neuron Stable Delegate and NeuronSDK because the hardware does not yet support these features.
The model may not run on certain backends due to custom operators generated by the MTK converter. These custom operators are not recognized or supported by the TensorFlow Lite interpreter, which may lead to incompatibility issues during inference.
ShuffleNetV2-quant8
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate |
NeuronSDK |
G350 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
G510 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
G700 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
G1200 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
ShuffleNetV2-fp32
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate |
NeuronSDK |
G350 |
111.892 ms (Thread:4) |
254.189 ms |
161.390 ms |
118.324 ms |
N/A |
N/A |
G510 |
152.124 ms |
54.533 ms |
58.467 ms |
39.693 ms |
16.329 ms |
N/A |
G700 |
20.311 ms |
48.278 ms |
42.629 ms |
35.921 ms |
12.475 ms |
N/A |
G1200 |
18.917 ms |
47.996 ms |
32.253 ms |
25.790 ms |
23.315 ms |
N/A |
Run Benchmark Tools
This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.
First, push your TFLite model to the target device:
adb push <your_tflite_model> /usr/share/label_image/
Make sure to replace <your_tflite_model> with the actual path of your TFLite model.
Next, open an ADB shell to the target device:
adb shell
After this, you can execute the following commands directly from the shell.
Execute on CPU (8 threads)
To execute the benchmark on the CPU using 8 threads, run the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10
Execute on GPU, with GPU delegate
To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --num_runs=10
Execute on GPU, with Arm NN delegate
To execute the benchmark on the GPU using the Arm NN delegate, use the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10
Execute on CPU, with Arm NN delegate
To run the benchmark on the CPU using the Arm NN delegate, use the following command:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10
Execute on APU, with Neuron Delegate
For executing on the APU using the Neuron delegate, run the following command:
benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>
Note
If you are using the G350 platform, please make the following adjustments:
For CPU-based benchmarks, change the –num_threads parameter to 4:
benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
For all benchmarks (CPU, GPU, Arm NN), add the parameter –use_xnnpack=0 to disable the XNNPACK delegate
Neuron SDK
Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:
Transfer the Model to the Device:
Use adb to push your TFLite model to the device:
adb push <your_tflite_model> /user/share/benchmark_dla/
Note
Make sure to push the pruned model (after using the pruning script) to the device to ensure compatibility with the DLA converter. The pruned model should be used instead of the original model for accurate benchmarking.
Access the Device Shell:
Connect to your device’s shell:
adb shell
Navigate to the Benchmark Directory:
Change to the directory where the model is stored:
cd /user/share/benchmark_dla/
Run the Benchmark:
Execute the benchmarking script with the following command:
python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'
Description:
The benchmark.py script runs a performance evaluation on your model using MDLA 3.0.
The –file parameter specifies the path to your TFLite model.
The –target mdla3.0 option sets the target hardware to MDLA 3.0.
The –profile flag enables profiling to provide detailed performance metrics.
The –options=’–relax-fp32’ option allows relaxation of floating-point precision to improve compatibility with MDLA.