DenseNet Models

Overview

DenseNet connects each layer to every other layer in a feedforward manner, offering several notable advantages: it alleviates the vanishing gradient problem, enhances feature propagation, encourages feature reuse, and significantly reduces the number of parameters. These improvements allow DenseNet to achieve high performance with fewer computations compared to most state-of-the-art networks.

Getting Started

Follow these steps to use and convert DenseNet models using PyTorch and TorchVision.

Install Required Libraries:

Ensure you have the necessary libraries installed:
```
pip install torch torchvision
```

Load and Convert DenseNet Model:

Load a pretrained DenseNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

import torch
import torchvision

model = torchvision.models.densenet121(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
torch.jit.save(trace_model, 'densenet121.pt')

How It Works ?

Before you begin, ensure that the NeuroPilot Converter Tool is installed.

Quant8 Conversion Process

Generate Calibration Data:

The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.
```
import os
import numpy as np

os.mkdir('data')
for i in range(100):
    data = np.random.randn(1, 3, 224, 224).astype(np.float32)
    np.save('data/batch_{}.npy'.format(i), data)
```

Convert to Quantized TFLite Format:

Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

mtk_pytorch_converter                                 \
    --input_script_module_file=densenet121.pt         \
    --output_file=densenet121_ptq_quant.tflite        \
    --input_shapes=1,3,224,224                        \
    --quantize=True                                   \
    --input_value_ranges=-1,1                         \
    --calibration_data_dir=data/                      \
    --calibration_data_regexp=batch_.*\.npy           \
    --allow_incompatible_paddings_for_tflite_pooling=True

FP32 Conversion Process

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=densenet121.pt         \
    --output_file=densenet121.tflite                  \
    --input_shapes=1,3,224,224                        \
    --allow_incompatible_paddings_for_tflite_pooling=True

Model Details

General Information

Property	Value
Category	Classification
Input Size	224x224
GFLOPS	2.83
#Params (M)	7.97
Training Framework	PyTorch
Inference Framework	TFLite
Quant8 Model package	Download
Float32 Model package	Download

Model Properties

Quant8

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.1
Tensor	int8[1,3,224,224]
Identifier	344
Quantization	Linear
Quantization Range	-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property	Value
Name	2039
Tensor	int8[1,1000]
Identifier	590
Quantization	Linear
Quantization Range	-4.6631 ≤ 0.0466 * (q + 28) ≤ 7.2278

Fp32

Format: TensorFlow Lite v3
Description: Exported by NeuroPilot converter v7.14.1+release

Inputs

Property	Value
Name	x.1
Tensor	float32[1,3,224,224]
Identifier	538

Outputs

Property	Value
Name	2039
Tensor	float32[1,1000]
Identifier	279

Performance Benchmarks

DenseNet-quant8

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	N/A	N/A	N/A	N/A	N/A	N/A	N/A
G510	N/A	N/A	N/A	N/A	N/A	7.03 ms	N/A
G700	N/A	N/A	N/A	N/A	N/A	5.03 ms	N/A
G1200	N/A	N/A	N/A	N/A	N/A	6.04 ms	N/A

DenseNet-fp32

Run model (.tflite) 10 times	CPU (Thread:8)	GPU	ARMNN(GpuAcc)	ARMNN(CpuAcc)	Neuron Stable Delegate(APU)	APU(MDLA)	APU(VPU)
G350	198.458 ms (Thread:4)	73.741 ms	102.228 ms	123.873 ms	N/A	N/A	1517.21 ms
G510	174.425 ms	27.616 ms	40.920 ms	37.749 ms	8.272 ms	9.03 ms	N/A
G700	49.287 ms	19.659 ms	30.424 ms	31.363 ms	5.749 ms	6.04 ms	N/A
G1200	44.765 ms	14.796 ms	22.689 ms	20.943 ms	6.636 ms	6.05 ms	N/A

Widespread: CPU only, light workload.
Performance: CPU and GPU, medium workload.
Ultimate: CPU, GPU, and APUs, heavy workload.

Resources

github