VGGFace Models
==============

Overview
--------

VGGFace is a deep convolutional neural network model designed for face recognition tasks. Based on the VGG architecture, it uses a deep structure with small convolutional filters to capture detailed facial features. Trained on a large dataset of celebrity faces, VGGFace excels in face recognition, verification, and feature extraction, making it a widely used model in both research and industry due to its high accuracy and versatility.

Model Conversion Flow
---------------------

Precondition
^^^^^^^^^^^^

.. note::
   For better compatibility, it is recommended to use **Python 3.7** when working with these models, as it has higher compatibility with certain libraries and frameworks.

Get Source Model
^^^^^^^^^^^^^^^^

Follow these steps to set up and convert the VGGFace model using PyTorch.

1. **Clone the VGGFace PyTorch Repository:**

   .. code-block:: bash

       git clone https://github.com/prlz77/vgg-face.pytorch.git
       cd vgg-face.pytorch/


2. **Download and Extract the Pretrained VGGFace Model:**

   Download the pretrained VGGFace model from the following link:

   .. code-block:: bash

       wget https://www.robots.ox.ac.uk/~vgg/software/vgg_face/src/vgg_face_torch.tar.gz

   Extract the downloaded tar file:

   .. code-block:: bash

       tar zxvf vgg_face_torch.tar.gz

   Move the extracted files to the `pretrained` directory in the cloned repository:

   .. code-block:: bash

       mv vgg_face_torch/* pretrained/

3. **Modify the VGGFace Model Script:**

   Open the model script in a text editor:

   .. code-block:: bash

       gedit "models/vgg_face.py"

   Add the following lines to trace the model and save it as a TorchScript file:

   .. code-block:: python

       traced_model = torch.jit.trace(model, im)
       traced_model.save("vggface_traced_model.pt")
       print("Traced model saved successfully.")

   After adding the code, run the modified script:

   .. code-block:: bash
       
       python3 models/vgg_face.py

Converting Model for Deployment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Before you begin, ensure that the :doc:`NeuroPilot Converter Tool </sw/yocto/ml-guide/neuron-dev-flow/model_converter/neuropilot_converter_tool>` is installed. If you haven't installed it yet, please follow the instructions in the "Install and Verify NeuroPilot Converter Tool" section of the same guide.

Quant8 Conversion Process
*************************

1. **Generate Calibration Data:**

   The following script creates a directory named `data` and generates 100 batches of random input data, each saved as a `.npy` file. This data is used for calibration during the quantization process.

   .. code-block:: bash

       python generate_data_batches.py

   .. code-block:: python

       import os
       import numpy as np

       os.mkdir('data')
       for i in range(100):
           data = np.random.randn(1, 3, 224, 224).astype(np.float32)
           np.save('data/batch_{}.npy'.format(i), data)

2. **Convert to Quantized TFLite Format:**

   Use the following command to convert the model to a quantized TFLite format using the generated calibration data:

   .. code-block:: bash

       mtk_pytorch_converter                                 \
           --input_script_module_file=vggface_traced_model.pt    \
           --output_file=vggface_traced_model_ptq_quant.tflite   \
           --input_shapes=1,3,224,224                            \
           --quantize=True                                       \
           --input_value_ranges=-1,1                             \
           --calibration_data_dir=data/                          \
           --calibration_data_regexp=batch_.*\.npy		

3. **Convert to Quantized DLA Format**

   1 **Download the NeuroPilot SDK All-In-One Bundle:**

      Visit the following download page and download the necessary bundle: `NeuroPilot Downloads <https://neuropilot.mediatek.com/sphinx/neuropilot-6-basic-customer/html/l1_downloads/downloads_external.html#neuropilot-downloads>`_

   2. **Extract the Bundle:**

      After downloading, extract the bundle using the following command:

      .. code-block:: bash

          tar zxvf neuropilot-sdk-basic-<version>.tar.gz

   3. **Set the Environment Variables:**

      Set the environment variables to point to the SDK:

      .. code-block:: bash

          export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib

   4. **Convert INT8 TFLite Model to DLA Format:**

      Use the NeuroPilot Converter Tool to convert your TFLite model into the DLA format. The following example shows how to convert an INT8 TFLite model to DLA format using the specified architecture (`mdla3.0`):

      .. code-block:: bash

          /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 vggface_traced_model_ptq_quant.tflite

.. note::
   To ensure compatibility with your device, please download and use **NeuroPilot SDK version 6**. Other versions might not be fully supported.

FP32 Conversion Process
***********************

1. **Convert to FP32 TFLite Format:**

   To convert the model to a non-quantized (FP32) TFLite format, use the following command:

   .. code-block:: bash

       mtk_pytorch_converter                                 \
           --input_script_module_file=vggface_traced_model.pt \
           --output_file=vggface_traced_model.tflite          \
           --input_shapes=1,3,224,224

2. **Convert to FP32 DLA Format**

   1. **Set the Environment Variables:**

      Set the environment variables to point to the SDK:

      .. code-block:: bash

          export LD_LIBRARY_PATH=/path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/lib

   2. **Convert FP32 TFLite Model to DLA Format:**

      Use the NeuroPilot Converter Tool to convert your FP32 TFLite model into the DLA format. The following example shows how to convert an FP32 TFLite model to DLA format using the specified architecture (`mdla3.0`) and enabling relaxed FP32 operations:

      .. code-block:: bash

          /path/to/neuropilot-sdk-basic-<version>/neuron_sdk/host/bin/ncc-tflite --arch=mdla3.0 --relax-fp32 vggface_traced_model.tflite


Model Information
-----------------

.. note::
   The models and benchmark data mentioned below have been processed using the **mtk_converter**.

General Information
^^^^^^^^^^^^^^^^^^^

The following table contains general information about the model. The details, such as input size, GFLOPS, and number of parameters, are sourced from the official PyTorch documentation at: 
`VGG16 Model <https://pytorch.org/vision/main/models/generated/torchvision.models.vgg16.html#torchvision.models.vgg16>`_.

+-----------------------+----------------------------+
| Property              | Value                      |
+=======================+============================+
| Category              | Recognition                |
+-----------------------+----------------------------+
| Input Size            | 224x224                    |
+-----------------------+----------------------------+
| #MACs (G)             | None                       |
+-----------------------+----------------------------+
| #Params (M)           | None                       |
+-----------------------+----------------------------+
| Training Framework    | PyTorch                    |
+-----------------------+----------------------------+
| Inference Framework   | TFLite                     |
+-----------------------+----------------------------+

Pre-converted Model
^^^^^^^^^^^^^^^^^^^

Deployable Model
****************

+-----------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------+
| Model Type            | Download Link                                                                                                            | Supported Backend                                      |
+=======================+==========================================================================================================================+========================================================+
| Quant8  Model package | `Download: Quant8 <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/vggface-int8.zip>`_     | NeuronSDK                                              |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------+
| Float32 Model package | `Download: Fp32 <https://mediatek-aiot.s3.ap-southeast-1.amazonaws.com/aiot/download/model-zoo/vggface-fp32.zip>`_       | CPU,GPU,ArmNN,Neuron Stable Delegate,NeuronSDK         |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------+

Model Properties
****************

- **VGGFace-quant8**

**Inputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+=======================+==========================================+
| Name                  | x.1                                      |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,3,224,224]                        |
+-----------------------+------------------------------------------+
| Identifier            | 10                                       |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | -1.0039 ≤ 0.0078 * q ≤ 0.9961            |
+-----------------------+------------------------------------------+

**Outputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+=======================+==========================================+
| Name                  | 238                                      |
+-----------------------+------------------------------------------+
| Tensor                | int8[1,2622]                             |
+-----------------------+------------------------------------------+
| Identifier            | 52                                       |
+-----------------------+------------------------------------------+
| Quantization          | Linear                                   |
+-----------------------+------------------------------------------+
| Quantization Range    | -0.0163 ≤ 0.0002 * (q + 30) ≤ 0.0261     |
+-----------------------+------------------------------------------+

- **VGGFace-fp32**

**Inputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+=======================+==========================================+
| Name                  | x.1                                      |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,3,224,224]                     |
+-----------------------+------------------------------------------+
| Identifier            | 16                                       |
+-----------------------+------------------------------------------+


**Outputs**

+-----------------------+------------------------------------------+
| **Property**          | **Value**                                |
+=======================+==========================================+
| Name                  | 238                                      |
+-----------------------+------------------------------------------+
| Tensor                | float32[1,2622]                          |
+-----------------------+------------------------------------------+
| Identifier            | 46                                       |
+-----------------------+------------------------------------------+

Benchmark Results
^^^^^^^^^^^^^^^^^

.. note::
   The benchmark results shown below were measured with performance mode enabled. These numbers are for reference only, as actual performance may vary depending on the hardware and platform used.

   Please note the following limitations:

   1. The G350 does not support Neuron Stable Delegate and NeuronSDK because the hardware does not yet support these features.
   2. The model may not run on certain backends due to custom operators generated by the MTK converter. These custom operators are not recognized or supported by the TensorFlow Lite interpreter, which may lead to incompatibility issues during inference.
   3. Running models on the G350 using ArmNN inference may result in a crash due to the model size being too large for the platform to handle.

- **VGGFace-quant8**

.. csv-table::
   :file: /_asset/tables/Model_Benchmark_tables/int8_VGGFace.csv
   :width: 100%
   :widths: 10 15 15 15 15 15 15

- **VGGFace-fp32**

.. csv-table::
   :file: /_asset/tables/Model_Benchmark_tables/fp32_VGGFace.csv
   :width: 100%
   :widths: 10 15 15 15 15 15 15

Run Benchmark Tools
^^^^^^^^^^^^^^^^^^^

This section will guide you on how to execute the benchmark tool with different delegates and hardware configurations.

1. First, push your TFLite model to the target device:

.. code-block:: bash

   adb push <your_tflite_model> /usr/share/label_image/

Make sure to replace `<your_tflite_model>` with the actual path of your TFLite model.

2. Next, open an ADB shell to the target device:

.. code-block:: bash

   adb shell

After this, you can execute the following commands directly from the shell.

Execute on CPU (8 threads)
**************************

To execute the benchmark on the CPU using 8 threads, run the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=8 --num_runs=10

Execute on GPU, with GPU delegate
*********************************

To execute the benchmark on the GPU using the TensorFlow Lite GPU delegate, run the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --use_gpu=1 --gpu_precision_loss_allowed=1 --num_runs=10

Execute on GPU, with Arm NN delegate
************************************

To execute the benchmark on the GPU using the Arm NN delegate, use the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc" --num_runs=10

Execute on CPU, with Arm NN delegate
************************************

To run the benchmark on the CPU using the Arm NN delegate, use the following command:

.. code-block:: bash

   benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:CpuAcc" --num_runs=10

Execute on APU, with Neuron Delegate
************************************

For executing on the APU using the Neuron delegate, run the following command:

.. code-block:: bash

   benchmark_model --stable_delegate_settings_file=/usr/share/label_image/stable_delegate_settings.json --use_nnapi=false --use_xnnpack=false --use_gpu=false --min_secs=20 --graph=/usr/share/label_image/<your_tflite_model>

.. note::
   If you are using the G350 platform, please make the following adjustments:
   
   - For CPU-based benchmarks, change the `--num_threads` parameter to 4:
   
     .. code-block:: bash

        benchmark_model --graph=/usr/share/label_image/<your_tflite_model> --num_threads=4 --use_xnnpack=0 --num_runs=10
   
   - For all benchmarks (CPU, GPU, Arm NN), add the parameter `--use_xnnpack=0` to disable the XNNPACK delegate


Neuron SDK
----------

Follow these steps to benchmark your TensorFlow Lite model using the Neuron SDK with MDLA 3.0:

1. **Transfer the Model to the Device:**

   Use `adb` to push your TFLite model to the device:

   .. code-block:: bash

       adb push <your_tflite_model> /user/share/benchmark_dla/

2. **Access the Device Shell:**

   Connect to your device's shell:

   .. code-block:: bash

       adb shell

3. **Navigate to the Benchmark Directory:**

   Change to the directory where the model is stored:

   .. code-block:: bash

       cd /user/share/benchmark_dla/

4. **Run the Benchmark:**

   Execute the benchmarking script with the following command:

   .. code-block:: bash

       python3 benchmark.py --file <your_tflite_model> --target mdla3.0 --profile --options='--relax-fp32'

**Description:**

- The `benchmark.py` script runs a performance evaluation on your model using MDLA 3.0.
- The `--file` parameter specifies the path to your TFLite model.
- The `--target mdla3.0` option sets the target hardware to MDLA 3.0.
- The `--profile` flag enables profiling to provide detailed performance metrics.
- The `--options='--relax-fp32'` option allows relaxation of floating-point precision to improve compatibility with MDLA.

.. note::

   **Troubleshooting:**

   If you encounter the following error:

   .. code-block:: bash

       subprocess.CalledProcessError: Command 'ncc-tflite -arch mdla3.0 vggface_traced_model_ptq_quant.tflite -o vggface_traced_model_ptq_quant-mdla3.0.dla --relax-fp32' returned non-zero exit status

   Resolve the issue with these steps:

   1. **Push the `.dla` file to your device:**

      .. code-block:: bash

          adb push vggface_traced_model_ptq_quant.dla /user/share/benchmark_dla/

   2. **Access the device shell:**

      .. code-block:: bash

          adb shell

   3. **Navigate to the directory:**

      .. code-block:: bash

          cd /user/share/benchmark_dla/

   4. **Rename the `.dla` file:**

      .. code-block:: bash

          mv vggface_traced_model_ptq_quant.dla vggface_traced_model_ptq_quant-mdla3.0.dla

   5. **Re-run the benchmark script:**

      .. code-block:: bash

          python3 benchmark.py --file vggface_traced_model_ptq_quant.tflite --target mdla3.0 --profile --options='--relax-fp32'