MobileNetV2 Models

Overview

MobileNetV2 is a mobile-optimized neural network architecture that enhances performance across various tasks by using an inverted residual structure with narrow bottleneck layers, in contrast to traditional models. It employs lightweight depthwise convolutions to maintain computational efficiency while delivering high performance, making it ideal for mobile and embedded applications.

Getting Started

Follow these steps to use and convert MobileNetV2 models using PyTorch and TorchVision.

  1. Install Required Libraries:

    Ensure you have the necessary libraries installed:

    pip install torch torchvision
    
  2. Load and Convert MobileNetV2 Model:

    Load a pretrained MobileNetV2 model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

    import torch
    import torchvision
    
    model = torchvision.models.mobilenet_v2(pretrained=True)
    trace_data = torch.randn(1, 3, 224, 224)
    trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
    torch.jit.save(trace_model, 'mobilenet_v2_float.pt')
    

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

  1. Generate Calibration Data:

    The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.

    import os
    import numpy as np
    
    os.mkdir('data')
    for i in range(100):
        data = np.random.randn(1, 3, 224, 224).astype(np.float32)
        np.save('data/batch_{}.npy'.format(i), data)
    
  2. Convert to Quantized TFLite Format:

    Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

    mtk_pytorch_converter                                 \
        --input_script_module_file=mobilenet_v2_float.pt  \
        --output_file=mobilenet_v2_ptq_quant.tflite       \
        --input_shapes=1,3,224,224                        \
        --quantize=True                                   \
        --input_value_ranges=-1,1                         \
        --calibration_data_dir=data/                      \
        --calibration_data_regexp=batch_.*\.npy
    

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=mobilenet_v2_float.pt  \
    --output_file=mobilenet_v2_float.tflite           \
    --input_shapes=1,3,224,224

Model Details

General Information

Property

Value

Category

Classification

Input Size

224x224

GFLOPS

0.30

#Params (M)

3.50

Training Framework

PyTorch

Inference Framework

TFLite

Quant8 Model package

Download

Float32 Model package

Download

Model Properties

Quant8

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property

Value

Name

x.1

Tensor

int8[1,3,224,224]

Identifier

16

Quantization

Linear

Quantization Range

-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property

Value

Name

860

Tensor

int8[1,1000]

Identifier

162

Quantization

Linear

Quantization Range

-7.2336 ≤ 0.0539 * (q - 6) ≤ 6.5318

Fp32

  • Format: TensorFlow Lite v3

  • Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property

Value

Name

x.1

Tensor

float32[1,3,224,224]

Identifier

167

Outputs

Property

Value

Name

860

Tensor

float32[1,1000]

Identifier

145

Performance Benchmarks

MobileNetV2-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

N/A

N/A

N/A

N/A

N/A

N/A

N/A

G510

N/A

N/A

N/A

N/A

N/A

1.37 ms

N/A

G700

N/A

N/A

N/A

N/A

N/A

1.04 ms

N/A

G1200

N/A

N/A

N/A

N/A

N/A

1.04 ms

N/A

MobileNetV2-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate(APU)

APU(MDLA)

APU(VPU)

G350

59.502 ms (Thread:4)

50.763 ms

53.468 ms

50.948 ms

N/A

N/A

705.860 ms

G510

128.68 ms

17.537 ms

20.084 ms

16.201 ms

3.164 ms

3.57 ms

N/A

G700

14.156 ms

13.361 ms

15.221 ms

13.406 ms

2.225 ms

2.04 ms

N/A

G1200

13.484 ms

9.236 ms

10.845 ms

8.542 ms

2.965 ms

2.58 ms

N/A

  • Widespread: CPU only, light workload.

  • Performance: CPU and GPU, medium workload.

  • Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github