MobileNetV3 Models

Overview

MobileNetV3 is an advanced mobile-optimized model that combines hardware-aware network architecture search (NAS) with the NetAdapt algorithm to enhance performance on mobile CPUs. It introduces architectural improvements that boost efficiency, achieving a 4.6% increase in accuracy and a 5% reduction in latency compared to MobileNetV2, making it ideal for resource-constrained mobile applications.

Getting Started

Follow these steps to use and convert MobileNetV3 models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert MobileNetV3 Model:

Load a pretrained MobileNetV3 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.mobilenet_v3_small(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'mobilenet_v3_small_float.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=mobilenet_v3_small_float.pt    \
    --output_file=mobilenet_v3_ptq_quant.tflite               \
    --input_shapes=1,3,224,224                                \
    --quantize=True                                           \
    --input_value_ranges=-1,1                                 \
    --calibration_data_dir=data/                              \
    --calibration_data_regexp=batch_.*\.npy

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=mobilenet_v3_small_float.pt    \
    --output_file=mobilenet_v3_small_float.tflite             \
    --input_shapes=1,3,224,224

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	0.06
#Params (M)	2.54
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model Package	Download
Float32 Model Package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.2
Tensor	int8[1,3,224,224]
Identifier	216
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	812
Tensor	int8[1,1000]
Identifier	233
Quantization	Linear
Quantization Range	-4.3098 ≤ 0.0392 * (q + 18) ≤ 5.6811

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property	Value
Name	x.2
Tensor	float32[1,3,224,224]
Identifier	150

Outputs

Property	Value
Name	812
Tensor	float32[1,1000]
Identifier	41

Performance Benchmarks

MobileNetV3-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	1.04 ms	N/A
G700	N/A	N/A	N/A	N/A	N/A	0.04 ms	N/A
G1200	N/A	N/A	N/A	N/A	N/A	N/A	N/A

MobileNetV3-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	22.138 ms (Thread:4)	15.343 ms	18.880 ms	28.915 ms	N/A	N/A	235.255 ms
G510	82.273 ms	5.971 ms	7.707 ms	5.660 ms	2.492 ms	2.72 ms	N/A
G700	5.149 ms	4.819 ms	6.260 ms	4.897 ms	1.789 ms	1.05 ms	N/A
G1200	5.127 ms	3.872 ms	5.236 ms	3.642 ms	2.364 ms	2.05 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github