ResNet Models
Overview
ResNet (Residual Networks) introduces a residual learning framework designed to simplify the training of deeper neural networks compared to previously used architectures. Instead of learning a direct mapping, ResNet explicitly reformulates the layers to learn residual functions relative to the layer inputs. This approach has been empirically proven to make these networks easier to optimize and allows for improved accuracy as the network depth increases significantly.
Getting Started
Follow these steps to use and convert ResNet models using PyTorch and TorchVision.
Install Required Libraries:
Ensure you have the necessary libraries installed:
pip install torch torchvision
Load and Convert ResNet Model:
Load a pretrained ResNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.
import torch import torchvision model = torchvision.models.resnet18(pretrained=True) trace_data = torch.randn(1, 3, 224, 224) trace_model = torch.jit.trace(model.cpu().eval(), trace_data) torch.jit.save(trace_model, 'resnet_float.pt')
How It Works ?
Before you begin, ensure that the NeuroPilot Converter Tool is installed.
Quant8 Conversion Process
Generate Calibration Data:
The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
import os import numpy as np os.mkdir('data') for i in range(100): data = np.random.randn(1, 3, 224, 224).astype(np.float32) np.save('data/batch_{}.npy'.format(i), data)
Convert to Quantized TFLite Format:
Use the following command to convert the model to a quantized TFLite format using the generated calibration data:
mtk_pytorch_converter \ --input_script_module_file=resnet_float.pt \ --output_file=resnet_ptq_quant.tflite \ --input_shapes=1,3,224,224 \ --quantize=True \ --input_value_ranges=-1,1 \ --calibration_data_dir=data/ \ --calibration_data_regexp=batch_.*\.npy \ --allow_incompatible_paddings_for_tflite_pooling=True
FP32 Conversion Process
To convert the model to a non-quantized (FP32) TFLite format, use the following command:
mtk_pytorch_converter \
--input_script_module_file=resnet_float.pt \
--output_file=resnet_float.tflite \
--input_shapes=1,3,224,224 \
--allow_incompatible_paddings_for_tflite_pooling=True
Model Details
General Information
Property |
Value |
---|---|
Category |
Classification |
Input Size |
224x224 |
GFLOPS |
1.81 |
#Params (M) |
11.68 |
Training Framework |
PyTorch |
Inference Framework |
TFLite |
Quant8 Model package |
|
Float32 Model package |
Model Properties
Quant8
Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release
Inputs
Property |
Value |
---|---|
Name |
x.2 |
Tensor |
int8[1,3,224,224] |
Identifier |
23 |
Quantization |
Linear |
Quantization Range |
-1.0039 ≤ 0.0078 * q ≤ 0.9961 |
Outputs
Property |
Value |
---|---|
Name |
383 |
Tensor |
int8[1,1000] |
Identifier |
62 |
Quantization |
Linear |
Quantization Range |
-4.4169 ≤ 0.0429 * (q + 25) ≤ 6.5182 |
Fp32
Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v2.9.0
Inputs
Property |
Value |
---|---|
Name |
x.2 |
Tensor |
float32[1,3,224,224] |
Identifier |
80 |
Outputs
Property |
Value |
---|---|
Name |
383 |
Tensor |
float32[1,1000] |
Identifier |
5 |
Performance Benchmarks
ResNet-quant8
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate(APU) |
APU(MDLA) |
APU(VPU) |
G350 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
G510 |
N/A |
N/A |
N/A |
N/A |
N/A |
2.79 ms |
N/A |
G700 |
N/A |
N/A |
N/A |
N/A |
N/A |
2.03 ms |
N/A |
G1200 |
N/A |
N/A |
N/A |
N/A |
N/A |
2.05 ms |
N/A |
ResNet-fp32
Run model (.tflite) 10 times |
CPU (Thread:8) |
GPU |
ARMNN(GpuAcc) |
ARMNN(CpuAcc) |
Neuron Stable Delegate(APU) |
APU(MDLA) |
APU(VPU) |
G350 |
255.551 ms (Thread:4) |
178.460 ms |
147.218 ms |
112.225 ms |
N/A |
N/A |
846.724 ms |
G510 |
133.122 ms |
55.277 ms |
45.856 ms |
42.608 ms |
8.557 ms |
9.21 ms |
N/A |
G700 |
61.470 ms |
41.428 ms |
36.064 ms |
34.791 ms |
6.226 ms |
6.04 ms |
N/A |
G1200 |
56.593 msa |
30.080 ms |
22.070 ms |
20.762 ms |
7.981 ms |
8.04 ms |
N/A |
Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.