VGG Models

Overview

VGG16 is an enhancement of the earlier AlexNet model. It simplifies convolution operations by replacing AlexNet’s large convolution filters with smaller 3x3 filters, while using padding to preserve the input size before downsampling with 2x2 MaxPooling layers. This design choice made the model more efficient and contributed to its widespread adoption in image recognition tasks.

Getting Started

Follow these steps to use and convert VGG models using PyTorch and TorchVision.

  1. Install Required Libraries:

    Ensure you have the necessary libraries installed:

    pip install torch torchvision
    
  2. Load and Convert VGG Model:

    Load a pretrained VGG model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

    import torch
    import torchvision
    
    model = torchvision.models.vgg16(pretrained=True)
    trace_data = torch.randn(1, 3, 224, 224)
    trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
    torch.jit.save(trace_model, 'vgg_float.pt')
    

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

  1. Generate Calibration Data:

    The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.

    import os
    import numpy as np
    
    os.mkdir('data')
    for i in range(100):
        data = np.random.randn(1, 3, 224, 224).astype(np.float32)
        np.save('data/batch_{}.npy'.format(i), data)
    
  2. Convert to Quantized TFLite Format:

    Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

    mtk_pytorch_converter                                 \
        --input_script_module_file=vgg_float.pt           \
        --output_file=vgg_ptq_quant.tflite                \
        --input_shapes=1,3,224,224                        \
        --quantize=True                                   \
        --input_value_ranges=-1,1                         \
        --calibration_data_dir=data/                      \
        --calibration_data_regexp=batch_.*\.npy
    

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=vgg_float.pt           \
    --output_file=vgg_float.tflite                    \
    --input_shapes=1,3,224,224

Model Details

General Information

Property

Value

Category

Classification

Input Size

224x224

GFLOPS

15.47

#Params (M)

138.35

Training Framework

PyTorch

Inference Framework

TFLite

Quant8 Model package

Download

Float32 Model package

Download

Model Properties

Quant8

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property

Value

Name

x.1

Tensor

int8[1,3,224,224]

Identifier

10

Quantization

Linear

Quantization Range

-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property

Value

Name

238

Tensor

int8[1,2622]

Identifier

52

Quantization

Linear

Quantization Range

-0.0163 ≤ 0.0002 * (q + 30) ≤ 0.0261

Fp32

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property

Value

Name

x.1

Tensor

float32[1,3,224,224]

Identifier

16

Outputs

Property

Value

Name

238

Tensor

float32[1,2622]

Identifier

46

Performance Benchmarks

VGG-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

N/A

N/A

N/A

N/A

N/A

N/A

N/A

G510

N/A

N/A

N/A

N/A

N/A

24.85 ms

N/A

G700

N/A

N/A

N/A

N/A

N/A

16.06 ms

N/A

G1200

N/A

N/A

N/A

N/A

N/A

23.05 ms

N/A

VGG-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

2183.11 ms (Thread:4)

845.673 ms

710.335 ms

N/A

N/A

N/A

6386.6 ms

G510

665.157 ms

291.764 ms

221.333 ms

231.162 ms

80.132 ms

80.3 ms

N/A

G700

454.178 ms

227.987 ms

166.924 ms

191.927 ms

55.961 ms

56.04 ms

N/A

G1200

383.546 ms

131.883 ms

97.662 ms

111.192 ms

50.183 ms

50.04 ms

N/A

  • Widespread: CPU only, light workload.

  • Performance: CPU and GPU, medium workload.

  • Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github