VGG Models

Overview

VGG16 is an enhancement of the earlier AlexNet model. It simplifies convolution operations by replacing AlexNet’s large convolution filters with smaller 3x3 filters, while using padding to preserve the input size before downsampling with 2x2 MaxPooling layers. This design choice made the model more efficient and contributed to its widespread adoption in image recognition tasks.

Getting Started

Follow these steps to use and convert VGG models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert VGG Model:

Load a pretrained VGG model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.vgg16(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'vgg_float.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=vgg_float.pt           \
    --output_file=vgg_ptq_quant.tflite                \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=vgg_float.pt           \
    --output_file=vgg_float.tflite                    \
    --input_shapes=1,3,224,224

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	15.47
#Params (M)	138.35
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.1
Tensor	int8[1,3,224,224]
Identifier	10
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	238
Tensor	int8[1,2622]
Identifier	52
Quantization	Linear
Quantization Range	-0.0163 ≤ 0.0002 * (q + 30) ≤ 0.0261

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.1
Tensor	float32[1,3,224,224]
Identifier	16

Outputs

Property	Value
Name	238
Tensor	float32[1,2622]
Identifier	46

Performance Benchmarks

VGG-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	24.85 ms	N/A
G700	N/A	N/A	N/A	N/A	N/A	16.06 ms	N/A
G1200	N/A	N/A	N/A	N/A	N/A	23.05 ms	N/A

VGG-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	2183.11 ms (Thread:4)	845.673 ms	710.335 ms	N/A	N/A	N/A	6386.6 ms
G510	665.157 ms	291.764 ms	221.333 ms	231.162 ms	80.132 ms	80.3 ms	N/A
G700	454.178 ms	227.987 ms	166.924 ms	191.927 ms	55.961 ms	56.04 ms	N/A
G1200	383.546 ms	131.883 ms	97.662 ms	111.192 ms	50.183 ms	50.04 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github