MobileNetV2 Models

Overview

MobileNetV2 is a mobile-optimized neural network architecture that enhances performance across various tasks by using an inverted residual structure with narrow bottleneck layers, in contrast to traditional models. It employs lightweight depthwise convolutions to maintain computational efficiency while delivering high performance, making it ideal for mobile and embedded applications.

Getting Started

Follow these steps to use and convert MobileNetV2 models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert MobileNetV2 Model:

Load a pretrained MobileNetV2 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'mobilenet_v2_float.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=mobilenet_v2_float.pt  \
    --output_file=mobilenet_v2_ptq_quant.tflite       \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=mobilenet_v2_float.pt  \
    --output_file=mobilenet_v2_float.tflite           \
    --input_shapes=1,3,224,224

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	0.30
#Params (M)	3.50
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.1
Tensor	int8[1,3,224,224]
Identifier	16
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	860
Tensor	int8[1,1000]
Identifier	162
Quantization	Linear
Quantization Range	-7.2336 ≤ 0.0539 * (q - 6) ≤ 6.5318

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property	Value
Name	x.1
Tensor	float32[1,3,224,224]
Identifier	167

Outputs

Property	Value
Name	860
Tensor	float32[1,1000]
Identifier	145

Performance Benchmarks

MobileNetV2-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	1.37 ms	N/A
G700	N/A	N/A	N/A	N/A	N/A	1.04 ms	N/A
G1200	N/A	N/A	N/A	N/A	N/A	1.04 ms	N/A

MobileNetV2-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	59.502 ms (Thread:4)	50.763 ms	53.468 ms	50.948 ms	N/A	N/A	705.860 ms
G510	128.68 ms	17.537 ms	20.084 ms	16.201 ms	3.164 ms	3.57 ms	N/A
G700	14.156 ms	13.361 ms	15.221 ms	13.406 ms	2.225 ms	2.04 ms	N/A
G1200	13.484 ms	9.236 ms	10.845 ms	8.542 ms	2.965 ms	2.58 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github