MobileNetV3 Models
Overview
MobileNetV3 is an advanced mobile-optimized model that combines hardware-aware network architecture search (NAS) with the NetAdapt algorithm to enhance performance on mobile CPUs. It introduces architectural improvements that boost efficiency, achieving a 4.6% increase in accuracy and a 5% reduction in latency compared to MobileNetV2, making it ideal for resource-constrained mobile applications.
Getting Started
Follow these steps to use and convert MobileNetV3 models using PyTorch and TorchVision.
Install Required Libraries:
Ensure you have the necessary libraries installed:
pip install torch torchvision
Load and Convert MobileNetV3 Model:
Load a pretrained MobileNetV3 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.
import torch import torchvision model = torchvision.models.mobilenet_v3_small(pretrained=True) trace_data = torch.randn(1, 3, 224, 224) trace_model = torch.jit.trace(model.cpu().eval(), trace_data) torch.jit.save(trace_model, 'mobilenet_v3_small_float.pt')
How It Works ?
Before you begin, ensure that the NeuroPilot Converter Tool is installed.
Quant8 Conversion Process
Generate Calibration Data:
The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
import os import numpy as np os.mkdir('data') for i in range(100): data = np.random.randn(1, 3, 224, 224).astype(np.float32) np.save('data/batch_{}.npy'.format(i), data)
Convert to Quantized TFLite Format:
Use the following command to convert the model to a quantized TFLite format using the generated calibration data:
mtk_pytorch_converter \ --input_script_module_file=mobilenet_v3_small_float.pt \ --output_file=mobilenet_v3_ptq_quant.tflite \ --input_shapes=1,3,224,224 \ --quantize=True \ --input_value_ranges=-1,1 \ --calibration_data_dir=data/ \ --calibration_data_regexp=batch_.*\.npy
FP32 Conversion Process
To convert the model to a non-quantized (FP32) TFLite format, use the following command:
mtk_pytorch_converter \
--input_script_module_file=mobilenet_v3_small_float.pt \
--output_file=mobilenet_v3_small_float.tflite \
--input_shapes=1,3,224,224
Model Details
General Information
Property |
Value |
---|---|
Category |
Classification |
Input Size |
224x224 |
GFLOPS |
0.06 |
#Params (M) |
2.54 |
Training Framework |
PyTorch |
Inference Framework |
TFLite |
Quant8 Model Package |
|
Float32 Model Package |
Model Properties
Quant8
Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release
Inputs
Property |
Value |
---|---|
Name |
x.2 |
Tensor |
int8[1,3,224,224] |
Identifier |
216 |
Quantization |
Linear |
Quantization Range |
-1.0039 ≤ 0.0078 * q ≤ 0.9961 |
Outputs
Property |
Value |
---|---|
Name |
812 |
Tensor |
int8[1,1000] |
Identifier |
233 |
Quantization |
Linear |
Quantization Range |
-4.3098 ≤ 0.0392 * (q + 18) ≤ 5.6811 |
Fp32
Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0
Inputs
Property |
Value |
---|---|
Name |
x.2 |
Tensor |
float32[1,3,224,224] |
Identifier |
150 |
Outputs
Property |
Value |
---|---|
Name |
812 |
Tensor |
float32[1,1000] |
Identifier |
41 |
Performance Benchmarks
MobileNetV3-quant8
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate(APU) |
APU(MDLA) |
APU(VPU) |
G350 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
G510 |
N/A |
N/A |
N/A |
N/A |
N/A |
1.04 ms |
N/A |
G700 |
N/A |
N/A |
N/A |
N/A |
N/A |
0.04 ms |
N/A |
G1200 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
MobileNetV3-fp32
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate(APU) |
APU(MDLA) |
APU(VPU) |
G350 |
22.138 ms (Thread:4) |
15.343 ms |
18.880 ms |
28.915 ms |
N/A |
N/A |
235.255 ms |
G510 |
82.273 ms |
5.971 ms |
7.707 ms |
5.660 ms |
2.492 ms |
2.72 ms |
N/A |
G700 |
5.149 ms |
4.819 ms |
6.260 ms |
4.897 ms |
1.789 ms |
1.05 ms |
N/A |
G1200 |
5.127 ms |
3.872 ms |
5.236 ms |
3.642 ms |
2.364 ms |
2.05 ms |
N/A |
Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.