SqueezeNet Models

Overview

SqueezeNet is a deep neural network (DNN) architecture designed to achieve high accuracy with a significantly smaller model size. It offers several advantages, including reduced communication needs during distributed training, lower bandwidth requirements for model deployment, and greater feasibility for deployment on hardware with limited memory, such as FPGAs. Furthermore, using model compression techniques, SqueezeNet can be reduced to less than 0.5MB. This compact design makes SqueezeNet an ideal choice for applications where efficiency and memory constraints are critical.

Getting Started

Follow these steps to use and convert SqueezeNet models using PyTorch and TorchVision.

  1. Install Required Libraries:

    Ensure you have the necessary libraries installed:

    pip install torch torchvision
    
  2. Load and Convert SqueezeNet Model:

    Load a pretrained SqueezeNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

    import torch
    import torchvision
    
    model = torchvision.models.squeezenet1_0(pretrained=True)
    trace_data = torch.randn(1, 3, 224, 224)
    trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
    torch.jit.save(trace_model, 'squeezenet_float.pt')
    

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

  1. Generate Calibration Data:

    The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.

    import os
    import numpy as np
    
    os.mkdir('data')
    for i in range(100):
        data = np.random.randn(1, 3, 224, 224).astype(np.float32)
        np.save('data/batch_{}.npy'.format(i), data)
    
  2. Convert to Quantized TFLite Format:

    Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

    mtk_pytorch_converter                                 \
        --input_script_module_file=squeezenet_float.pt    \
        --output_file=squeezenet_ptq_quant.tflite         \
        --input_shapes=1,3,224,224                        \
        --quantize=True                                   \
        --input_value_ranges=-1,1                         \
        --calibration_data_dir=data/                      \
        --calibration_data_regexp=batch_.*\.npy
    

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=squeezenet_float.pt    \
    --output_file=squeezenet_float.tflite             \
    --input_shapes=1,3,224,224

Model Details

General Information

Property

Value

Category

Classification

Input Size

224x224

GFLOPS

0.82

#Params (M)

1.24

Training Framework

PyTorch

Inference Framework

TFLite

Quant8 Model package

Download

Float32 Model package

Download

Model Properties

Quant8

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property

Value

Name

x.2

Tensor

int8[1,3,224,224]

Identifier

80

Quantization

Linear

Quantization Range

-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property

Value

Name

8

Tensor

int8[1,1000]

Identifier

72

Quantization

Linear

Quantization Range

0.1473 * (q + 128) ≤ 37.5656

Fp32

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property

Value

Name

x.2

Tensor

float32[1,3,224,224]

Identifier

67

Outputs

Property

Value

Name

8

Tensor

float32[1,1000]

Identifier

82

Performance Benchmarks

SqueezeNet-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

56.194 ms (Thread:4)

112.385 ms

59.594 ms

47.425 ms

N/A

N/A

7985.68 ms

G510

64.899 ms

35.930 ms

21.246 ms

10.818 ms

1.471 ms

1.52 ms

N/A

G700

7.033 ms

24.855 ms

14.835 ms

9.754 ms

1.128 ms

1.04 ms

N/A

G1200

6.389 ms

18.027 ms

10.375 ms

5.619 ms

1.780 ms

1.05 ms

N/A

SqueezeNet-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

101.764 ms (Thread:4)

110.384 ms

80.185 ms

70.106 ms

N/A

N/A

380.042 ms

G510

87.865 ms

35.743 ms

31.189 ms

27.790 ms

4.527 ms

5.01 ms

N/A

G700

20.332 ms

24.693 ms

21.731 ms

23.491 ms

3.423 ms

3.04 ms

N/A

G1200

17.974 ms

17.816 ms

15.701 ms

14.193 ms

3.745 ms

3.05 ms

N/A

  • Widespread: CPU only, light workload.

  • Performance: CPU and GPU, medium workload.

  • Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github