ResNet Models

Overview

ResNet (Residual Networks) introduces a residual learning framework designed to simplify the training of deeper neural networks compared to previously used architectures. Instead of learning a direct mapping, ResNet explicitly reformulates the layers to learn residual functions relative to the layer inputs. This approach has been empirically proven to make these networks easier to optimize and allows for improved accuracy as the network depth increases significantly.

Getting Started

Follow these steps to use and convert ResNet models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert ResNet Model:

Load a pretrained ResNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.resnet18(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'resnet_float.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=resnet_float.pt        \
    --output_file=resnet_ptq_quant.tflite             \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy           \
    --allow_incompatible_paddings_for_tflite_pooling=True

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=resnet_float.pt        \
    --output_file=resnet_float.tflite                 \
    --input_shapes=1,3,224,224                        \
    --allow_incompatible_paddings_for_tflite_pooling=True

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	1.81
#Params (M)	11.68
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.2
Tensor	int8[1,3,224,224]
Identifier	23
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	383
Tensor	int8[1,1000]
Identifier	62
Quantization	Linear
Quantization Range	-4.4169 ≤ 0.0429 * (q + 25) ≤ 6.5182

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0

Inputs

Property	Value
Name	x.2
Tensor	float32[1,3,224,224]
Identifier	80

Outputs

Property	Value
Name	383
Tensor	float32[1,1000]
Identifier	5

Performance Benchmarks

ResNet-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	2.79 ms	N/A
G700	N/A	N/A	N/A	N/A	N/A	2.03 ms	N/A
G1200	N/A	N/A	N/A	N/A	N/A	2.05 ms	N/A

ResNet-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	255.551 ms (Thread:4)	178.460 ms	147.218 ms	112.225 ms	N/A	N/A	846.724 ms
G510	133.122 ms	55.277 ms	45.856 ms	42.608 ms	8.557 ms	9.21 ms	N/A
G700	61.470 ms	41.428 ms	36.064 ms	34.791 ms	6.226 ms	6.04 ms	N/A
G1200	56.593 msa	30.080 ms	22.070 ms	20.762 ms	7.981 ms	8.04 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github