InceptionV3 Models

Overview

Inception V3, a convolutional neural network (CNN) from Google’s Inception family, is designed for deep networks with fewer than 25 million parameters. It excels in image analysis and object detection, with applications ranging from computer vision to life sciences like leukemia research. Often, it’s used pre-trained on ImageNet

Getting Started

Follow these steps to use and convert Inception v3 models using PyTorch and TorchVision.

  1. Install Required Libraries:

    Ensure you have the necessary libraries installed:

    pip install torch torchvision
    
  2. Load and Convert Inception v3 Model:

    Load a pretrained Inception v3 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

    import torch
    import torchvision
    
    model = torchvision.models.inception_v3(pretrained=True)
    trace_data = torch.randn(1, 3, 224, 224)
    trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
    torch.jit.save(trace_model, 'inception_v3.pt')
    

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

  1. Generate Calibration Data:

    The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.

    import os
    import numpy as np
    
    os.mkdir('data')
    for i in range(100):
        data = np.random.randn(1, 3, 224, 224).astype(np.float32)
        np.save('data/batch_{}.npy'.format(i), data)
    
  2. Convert to Quantized TFLite Format:

    Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

    mtk_pytorch_converter                                 \
        --input_script_module_file=Inception_v3.pt        \
        --output_file=Inception_v3_ptq_quant.tflite       \
        --input_shapes=1,3,224,224                        \
        --quantize=True                                   \
        --input_value_ranges=-1,1                         \
        --calibration_data_dir=data/                      \
        --calibration_data_regexp=batch_.*\.npy
    

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=Inception_v3.pt        \
    --output_file=Inception_v3.tflite                 \
    --input_shapes=1,3,224,224

Model Details

General Information

Property

Value

Category

Classification

Input Size

224x224

GFLOPS

1.50

#Params (M)

6.62

Training Framework

PyTorch

Inference Framework

TFLite

Quant8 Model package

Download

Float32 Model package

Download

Model Properties

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v7.14.1+release

Quant8

Inputs

Property

Value

Name

x.2

Tensor

int8[1,3,224,224]

Identifier

154

Quantization

Linear

Quantization Range

-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property

Value

Name

1707

Tensor

int8[1,1000]

Identifier

73

Quantization

Linear

Quantization Range

-2.0561 ≤ 0.0196 * (q + 23) ≤ 2.9372

Fp32

Inputs

Property

Value

Name

x.2

Tensor

float32[1,3,224,224]

Identifier

157

Outputs

Property

Value

Name

1707

Tensor

float32[1,1000]

Identifier

8

Performance Benchmarks

InceptionV3-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

N/A

N/A

N/A

N/A

N/A

N/A

N/A

G510

N/A

N/A

N/A

N/A

N/A

5.59 ms

N/A

G700

N/A

N/A

N/A

N/A

N/A

3.04 ms

N/A

G1200

N/A

N/A

N/A

N/A

N/A

5.04 ms

N/A

InceptionV3-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

405.697 ms (Thread:4)

325.574 ms

275.707 ms

N/A

N/A

N/A

1399.8 ms

G510

319.684 ms

112.367 ms

98.235 ms

80.029 ms

17.199 ms

17.68 ms

N/A

G700

76.034 ms

83.642 ms

70.822 ms

68.439 ms

12.263 ms

12.04 ms

N/A

G1200

70.697 ms

68.656 ms

49.117 ms

41.429 ms

11.757 ms

11.04 ms

N/A

  • Widespread: CPU only, light workload.

  • Performance: CPU and GPU, medium workload.

  • Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github