DenseNet Models

Overview

DenseNet connects each layer to every other layer in a feedforward manner, offering several notable advantages: it alleviates the vanishing gradient problem, enhances feature propagation, encourages feature reuse, and significantly reduces the number of parameters. These improvements allow DenseNet to achieve high performance with fewer computations compared to most state-of-the-art networks.

Model Conversion Flow

Precondition

Note

For better compatibility, it is recommended to use Python 3.7 when working with these models, as it has higher compatibility with certain libraries and frameworks.

  1. Install Required Libraries:

    Ensure you have the necessary libraries installed:

    pip install torch torchvision
    

Get Source Model

Follow these steps to use and convert DenseNet models using PyTorch and TorchVision.

  1. Load and Convert DenseNet Model:

    Load a pretrained DenseNet model using PyTorch and TorchVision, create a dummy input tensor for tracing, trace the model to convert it to TorchScript, and finally save the traced model.

    Run the following Python script:

    python generate_densenet_float.py
    
    import torch
    import torchvision
    
    model = torchvision.models.densenet121(pretrained=True)
    trace_data = torch.randn(1, 3, 224, 224)
    trace_model = torch.jit.trace(model.cpu().eval(), trace_data)
    torch.jit.save(trace_model, 'densenet121.pt')
    

Converting Model for Deployment

Before you begin, ensure that the NeuroPilot Converter Tool is installed. If you haven’t installed it yet, please follow the instructions in the “Install and Verify NeuroPilot Converter Tool” section of the same guide.

Quant8 Conversion Process

  1. Generate Calibration Data:

    The following script creates a directory named data and generates 100 batches of random input data, each saved as a .npy file. This data is used for calibration during the quantization process.

    python generate_data_batches.py
    
    import os
    import numpy as np
    
    os.mkdir('data')
    for i in range(100):
        data = np.random.randn(1, 3, 224, 224).astype(np.float32)
        np.save('data/batch_{}.npy'.format(i), data)
    
  2. Convert to Quantized TFLite Format:

    Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

    mtk_pytorch_converter                                 \
        --input_script_module_file=densenet121.pt         \
        --output_file=densenet121_ptq_quant.tflite        \
        --input_shapes=1,3,224,224                        \
        --quantize=True                                   \
        --input_value_ranges=-1,1                         \
        --calibration_data_dir=data/                      \
        --calibration_data_regexp=batch_.*\.npy           \
        --allow_incompatible_paddings_for_tflite_pooling=True
    
  3. Convert to Quantized DLA Format

    1 Download the NeuroPilot SDK All-In-One Bundle:

    Visit the following download page and download the necessary bundle:NeuroPilot Downloads

    2 Extract the Bundle:

    After downloading, extract the bundle using the following command:

    tar zxvf neuropilot-sdk-basic-<version>.tar.gz
    
    1. Set the Environment Variables:

      Set the environment variables to point to the SDK:

      export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
      
    2. Convert INT8 TFLite Model to DLA Format:

    Use the NeuroPilot Converter Tool to convert your TFLite model into the DLA format. The following example shows how to convert an INT8 TFLite model (densenet121_ptq_quant.tflite) to DLA format using the specified architecture (mdla3.0) and enabling relaxed FP32 operations:

    /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 densenet121_ptq_quant.tflite
    

    Note

    To ensure compatibility with your device, please download and use NeuroPilot SDK version 6. Other versions might not be fully supported.

FP32 Conversion Process

  1. Convert to FP32 TFLite Format:

To convert the model to a non-quantized (FP32) TFLite format, use the following command:

mtk_pytorch_converter                                 \
    --input_script_module_file=densenet121.pt         \
    --output_file=densenet121.tflite                  \
    --input_shapes=1,3,224,224                        \
    --allow_incompatible_paddings_for_tflite_pooling=True
  1. Convert to FP32 DLA Format

    1. Set the Environment Variables:

      Set the environment variables to point to the SDK:

      export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib
      
    2. Convert FP32 TFLite Model to DLA Format:

      Use the NeuroPilot Converter Tool to convert your FP32 TFLite model into the DLA format. The following example shows how to convert an FP32 TFLite model (densenet121.tflite `) to DLA format using the specified architecture (`mdla3.0) and enabling relaxed FP32 operations:

      /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 densenet121.tflite
      

Model Information

Note

The models and benchmark data mentioned below have been processed using the mtk_converter.

General Information

The following table contains general information about the model. The details, such as input size, GFLOPS, and number of parameters, are sourced from the official PyTorch documentation at: DenswNet121 Model.

Property

Value

Category

Classification

Input Size

224x224

GFLOPS

2.83

#Params (M)

7.97

Training Framework

PyTorch

Inference Framework

TFLite

Pre-converted Model

Deployable Model

Model Type

Download Link

Supported Backend

Quant8 Model package

Download: Quant8

NeuronSDK

Float32 Model package

Download: Fp32

CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK

Model Properties

  • DenseNet-quant8

Inputs

Property

Value

Name

x.1

Tensor

int8[1,3,224,224]

Identifier

344

Quantization

Linear

Quantization Range

-1.0039 ≤ 0.0078 * q ≤ 0.9961

Outputs

Property

Value

Name

2039

Tensor

int8[1,1000]

Identifier

590

Quantization

Linear

Quantization Range

-4.6631 ≤ 0.0466 * (q + 28) ≤ 7.2278

  • DeneseNet-fp32

Inputs

Property

Value

Name

x.1

Tensor

float32[1,3,224,224]

Identifier

538

Outputs

Property

Value

Name

2039

Tensor

float32[1,1000]

Identifier

279

Benchmark Results

Note

The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used

Please note the following limitations:

  1. The G350 does not support Neuron Stable Delegate and NeuronSDK because the hardware does not yet support these features.

  2. The model may not run on certain backends due to custom operators generated by the MTK converter. These custom operators are not recognized or supported by the TensorFlow Lite interpreter, which may lead to incompatibility issues during inference.

  • DenseNet-quant8

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate

NeuronSDK

G350

N/A

N/A

N/A

N/A

N/A

N/A

G510

N/A

N/A

N/A

N/A

N/A

7.03 ms

G700

N/A

N/A

N/A

N/A

N/A

5.03 ms

G1200

N/A

N/A

N/A

N/A

N/A

6.04 ms

  • DenseNet-fp32

Run model (.tflite) 10 times

CPU (Thread:8)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

Neuron Stable Delegate

NeuronSDK

G350

198.458 ms (Thread:4)

73.741 ms

102.228 ms

123.873 ms

N/A

N/A

G510

174.425 ms

27.616 ms

40.920 ms

37.749 ms

8.272 ms

9.03 ms

G700

49.287 ms

19.659 ms

30.424 ms

31.363 ms

5.749 ms

6.04 ms

G1200

44.765 ms

14.796 ms

22.689 ms

20.943 ms

6.636 ms

6.05 ms

Run Benchmark Tools

This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.

  1. First, push your TFLite model to the target device:

adb push <your_tflite_model> /usr/share/label_image/

Make sure to replace <your_tflite_model> with the actual path of your TFLite model.

  1. Next, open an ADB shell to the target device:

adb shell

After this, you can execute the following commands directly from the shell.

Execute on CPU (8 threads)

To execute the benchmark on the CPU using 8 threads, run the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10

Execute on GPU, with GPU delegate

To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --num_runs=10

Execute on GPU, with Arm NN delegate

To execute the benchmark on the GPU using the Arm NN delegate, use the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10

Execute on CPU, with Arm NN delegate

To run the benchmark on the CPU using the Arm NN delegate, use the following command:

benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10

Execute on APU, with Neuron Delegate

For executing on the APU using the Neuron delegate, run the following command:

benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>

Note

If you are using the G350 platform, please make the following adjustments:

  • For CPU-based benchmarks, change the –num_threads parameter to 4:

    benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
    
  • For all benchmarks (CPU, GPU, Arm NN), add the parameter –use_xnnpack=0 to disable the XNNPACK delegate

Neuron SDK

Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:

  1. Transfer the Model to the Device:

    Use adb to push your TFLite model to the device:

    adb push <your_tflite_model> /user/share/benchmark_dla/
    
  2. Access the Device Shell:

    Connect to your device’s shell:

    adb shell
    
  3. Navigate to the Benchmark Directory:

    Change to the directory where the model is stored:

    cd /user/share/benchmark_dla/
    
  4. Run the Benchmark:

    Execute the benchmarking script with the following command:

    python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'
    

Description:

  • The benchmark.py script runs a performance evaluation on your model using MDLA 3.0.

  • The –file parameter specifies the path to your TFLite model.

  • The –target mdla3.0 option sets the target hardware to MDLA 3.0.

  • The –profile flag enables profiling to provide detailed performance metrics.

  • The –options=’–relax-fp32’ option allows relaxation of floating-point precision to improve compatibility with MDLA.