InceptionV3 Models

Overview

Inception V3, a convolutional neural network (CNN) from Google’s Inception family, is designed for deep networks with fewer than 25 million parameters. It excels in image analysis and object detection, with applications ranging from computer vision to life sciences like leukemia research. Often, it’s used pre-trained on ImageNet

Getting Started

Follow these steps to use and convert Inception v3 models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert Inception v3 Model:

Load a pretrained Inception v3 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.inception_v3(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'inception_v3.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=Inception_v3.pt        \
    --output_file=Inception_v3_ptq_quant.tflite       \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=Inception_v3.pt        \
    --output_file=Inception_v3.tflite                 \
    --input_shapes=1,3,224,224

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	1.50
#Params (M)	6.62
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Quant8

Inputs

Property	Value
Name	x.2
Tensor	int8[1,3,224,224]
Identifier	154
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	1707
Tensor	int8[1,1000]
Identifier	73
Quantization	Linear
Quantization Range	-2.0561 ≤ 0.0196 * (q + 23) ≤ 2.9372

Fp32

Inputs

Property	Value
Name	x.2
Tensor	float32[1,3,224,224]
Identifier	157

Outputs

Property	Value
Name	1707
Tensor	float32[1,1000]
Identifier	8

Performance Benchmarks

InceptionV3-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	5.59 ms	N/A
G700	N/A	N/A	N/A	N/A	N/A	3.04 ms	N/A
G1200	N/A	N/A	N/A	N/A	N/A	5.04 ms	N/A

InceptionV3-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	405.697 ms (Thread:4)	325.574 ms	275.707 ms	N/A	N/A	N/A	1399.8 ms
G510	319.684 ms	112.367 ms	98.235 ms	80.029 ms	17.199 ms	17.68 ms	N/A
G700	76.034 ms	83.642 ms	70.822 ms	68.439 ms	12.263 ms	12.04 ms	N/A
G1200	70.697 ms	68.656 ms	49.117 ms	41.429 ms	11.757 ms	11.04 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github