MobileNetV3 Models

Overview

MobileNetV3 is an advanced mobile-optimized model that combines hardware-aware network architecture search (NAS) with the NetAdapt algorithm to enhance performance on mobile CPUs. It introduces architectural improvements that boost efficiency, achieving a 4.6% increase in accuracy and a 5% reduction in latency compared to MobileNetV2, making it ideal for resource-constrained mobile applications.

Getting Started

Follow these steps to use and convert MobileNetV3 models using PyTorch and TorchVision.

  1. Install Required Libraries:

    Ensure you have the necessary libraries installed:

    pip install torch torchvision
    
  2. Load and Convert MobileNetV3 Model:

    Load a pretrained MobileNetV3 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

    import torch
    import torchvision
    
    model = torchvision.models.mobilenet_v3_small(pretrained=True)
    trace_data = torch.randn(1, 3, 224, 224)
    trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
    torch.jit.save(trace_model, 'mobilenet_v3_small_float.pt')
    

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

  1. Generate Calibration Data:

    The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.

    import os
    import numpy as np
    
    os.mkdir('data')
    for i in range(100):
        data = np.random.randn(1, 3, 224, 224).astype(np.float32)
        np.save('data/batch_{}.npy'.format(i), data)
    
  2. Convert to Quantized TFLite Format:

    Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

    mtk_pytorch_converter                                 \
        --input_script_module_file=mobilenet_v3_small_float.pt    \
        --output_file=mobilenet_v3_ptq_quant.tflite               \
        --input_shapes=1,3,224,224                                \
        --quantize=True                                           \
        --input_value_ranges=-1,1                                 \
        --calibration_data_dir=data/                              \
        --calibration_data_regexp=batch_.*\.npy
    

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=mobilenet_v3_small_float.pt    \
    --output_file=mobilenet_v3_small_float.tflite             \
    --input_shapes=1,3,224,224

Model Details

General Information

Property

Value

Category

Classification

Input Size

224x224

GFLOPS

0.06

#Params (M)

2.54

Training Framework

PyTorch

Inference Framework

TFLite

Quant8 Model Package

Download

Float32 Model Package

Download

Model Properties

Quant8

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property

Value

Name

x.2

Tensor

int8[1,3,224,224]

Identifier

216

Quantization

Linear

Quantization Range

-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property

Value

Name

812

Tensor

int8[1,1000]

Identifier

233

Quantization

Linear

Quantization Range

-4.3098 ≤ 0.0392 * (q + 18) ≤ 5.6811

Fp32

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property

Value

Name

x.2

Tensor

float32[1,3,224,224]

Identifier

150

Outputs

Property

Value

Name

812

Tensor

float32[1,1000]

Identifier

41

Performance Benchmarks

MobileNetV3-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

N/A

N/A

N/A

N/A

N/A

N/A

N/A

G510

N/A

N/A

N/A

N/A

N/A

1.04 ms

N/A

G700

N/A

N/A

N/A

N/A

N/A

0.04 ms

N/A

G1200

N/A

N/A

N/A

N/A

N/A

N/A

N/A

MobileNetV3-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

22.138 ms (Thread:4)

15.343 ms

18.880 ms

28.915 ms

N/A

N/A

235.255 ms

G510

82.273 ms

5.971 ms

7.707 ms

5.660 ms

2.492 ms

2.72 ms

N/A

G700

5.149 ms

4.819 ms

6.260 ms

4.897 ms

1.789 ms

1.05 ms

N/A

G1200

5.127 ms

3.872 ms

5.236 ms

3.642 ms

2.364 ms

2.05 ms

N/A

  • Widespread: CPU only, light workload.

  • Performance: CPU and GPU, medium workload.

  • Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github