SqueezeNet Models
Overview
SqueezeNet is a deep neural network (DNN) architecture designed to achieve high accuracy with a significantly smaller model size. It offers several advantages, including reduced communication needs during distributed training, lower bandwidth requirements for model deployment, and greater feasibility for deployment on hardware with limited memory, such as FPGAs. Furthermore, using model compression techniques, SqueezeNet can be reduced to less than 0.5MB. This compact design makes SqueezeNet an ideal choice for applications where efficiency and memory constraints are critical.
Getting Started
Follow these steps to use and convert SqueezeNet models using PyTorch and TorchVision.
Install Required Libraries:
Ensure you have the necessary libraries installed:
pip install torch torchvision
Load and Convert SqueezeNet Model:
Load a pretrained SqueezeNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.
import torch import torchvision model = torchvision.models.squeezenet1_0(pretrained=True) trace_data = torch.randn(1, 3, 224, 224) trace_model = torch.jit.trace(model.cpu().eval(), trace_data) torch.jit.save(trace_model, 'squeezenet_float.pt')
How It Works ?
Before you begin, ensure that the NeuroPilot Converter Tool is installed.
Quant8 Conversion Process
Generate Calibration Data:
The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
import os import numpy as np os.mkdir('data') for i in range(100): data = np.random.randn(1, 3, 224, 224).astype(np.float32) np.save('data/batch_{}.npy'.format(i), data)
Convert to Quantized TFLite Format:
Use the following command to convert the model to a quantized TFLite format using the generated calibration data:
mtk_pytorch_converter \ --input_script_module_file=squeezenet_float.pt \ --output_file=squeezenet_ptq_quant.tflite \ --input_shapes=1,3,224,224 \ --quantize=True \ --input_value_ranges=-1,1 \ --calibration_data_dir=data/ \ --calibration_data_regexp=batch_.*\.npy
FP32 Conversion Process
To convert the model to a non-quantized (FP32) TFLite format, use the following command:
mtk_pytorch_converter \
--input_script_module_file=squeezenet_float.pt \
--output_file=squeezenet_float.tflite \
--input_shapes=1,3,224,224
Model Details
General Information
Property |
Value |
---|---|
Category |
Classification |
Input Size |
224x224 |
GFLOPS |
0.82 |
#Params (M) |
1.24 |
Training Framework |
PyTorch |
Inference Framework |
TFLite |
Quant8 Model package |
|
Float32 Model package |
Model Properties
Quant8
Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release
Inputs
Property |
Value |
---|---|
Name |
x.2 |
Tensor |
int8[1,3,224,224] |
Identifier |
80 |
Quantization |
Linear |
Quantization Range |
-1.0039 ≤ 0.0078 * q ≤ 0.9961 |
Outputs
Property |
Value |
---|---|
Name |
8 |
Tensor |
int8[1,1000] |
Identifier |
72 |
Quantization |
Linear |
Quantization Range |
0.1473 * (q + 128) ≤ 37.5656 |
Fp32
Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0
Inputs
Property |
Value |
---|---|
Name |
x.2 |
Tensor |
float32[1,3,224,224] |
Identifier |
67 |
Outputs
Property |
Value |
---|---|
Name |
8 |
Tensor |
float32[1,1000] |
Identifier |
82 |
Performance Benchmarks
SqueezeNet-quant8
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate(APU) |
APU(MDLA) |
APU(VPU) |
G350 |
56.194 ms (Thread:4) |
112.385 ms |
59.594 ms |
47.425 ms |
N/A |
N/A |
7985.68 ms |
G510 |
64.899 ms |
35.930 ms |
21.246 ms |
10.818 ms |
1.471 ms |
1.52 ms |
N/A |
G700 |
7.033 ms |
24.855 ms |
14.835 ms |
9.754 ms |
1.128 ms |
1.04 ms |
N/A |
G1200 |
6.389 ms |
18.027 ms |
10.375 ms |
5.619 ms |
1.780 ms |
1.05 ms |
N/A |
SqueezeNet-fp32
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate(APU) |
APU(MDLA) |
APU(VPU) |
G350 |
101.764 ms (Thread:4) |
110.384 ms |
80.185 ms |
70.106 ms |
N/A |
N/A |
380.042 ms |
G510 |
87.865 ms |
35.743 ms |
31.189 ms |
27.790 ms |
4.527 ms |
5.01 ms |
N/A |
G700 |
20.332 ms |
24.693 ms |
21.731 ms |
23.491 ms |
3.423 ms |
3.04 ms |
N/A |
G1200 |
17.974 ms |
17.816 ms |
15.701 ms |
14.193 ms |
3.745 ms |
3.05 ms |
N/A |
Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.