SqueezeNet Models

Overview

SqueezeNet is a deep neural network (DNN) architecture designed to achieve high accuracy with a significantly smaller model size. It offers several advantages, including reduced communication needs during distributed training, lower bandwidth requirements for model deployment, and greater feasibility for deployment on hardware with limited memory, such as FPGAs. Furthermore, using model compression techniques, SqueezeNet can be reduced to less than 0.5MB. This compact design makes SqueezeNet an ideal choice for applications where efficiency and memory constraints are critical.

Getting Started

Follow these steps to use and convert SqueezeNet models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert SqueezeNet Model:

Load a pretrained SqueezeNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.squeezenet1_0(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'squeezenet_float.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=squeezenet_float.pt    \
    --output_file=squeezenet_ptq_quant.tflite         \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=squeezenet_float.pt    \
    --output_file=squeezenet_float.tflite             \
    --input_shapes=1,3,224,224

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	0.82
#Params (M)	1.24
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.2
Tensor	int8[1,3,224,224]
Identifier	80
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	8
Tensor	int8[1,1000]
Identifier	72
Quantization	Linear
Quantization Range	0.1473 * (q + 128) ≤ 37.5656

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property	Value
Name	x.2
Tensor	float32[1,3,224,224]
Identifier	67

Outputs

Property	Value
Name	8
Tensor	float32[1,1000]
Identifier	82

Performance Benchmarks

SqueezeNet-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	56.194 ms (Thread:4)	112.385 ms	59.594 ms	47.425 ms	N/A	N/A	7985.68 ms
G510	64.899 ms	35.930 ms	21.246 ms	10.818 ms	1.471 ms	1.52 ms	N/A
G700	7.033 ms	24.855 ms	14.835 ms	9.754 ms	1.128 ms	1.04 ms	N/A
G1200	6.389 ms	18.027 ms	10.375 ms	5.619 ms	1.780 ms	1.05 ms	N/A

SqueezeNet-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	101.764 ms (Thread:4)	110.384 ms	80.185 ms	70.106 ms	N/A	N/A	380.042 ms
G510	87.865 ms	35.743 ms	31.189 ms	27.790 ms	4.527 ms	5.01 ms	N/A
G700	20.332 ms	24.693 ms	21.731 ms	23.491 ms	3.423 ms	3.04 ms	N/A
G1200	17.974 ms	17.816 ms	15.701 ms	14.193 ms	3.745 ms	3.05 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github