Body Pose Estimation
Introduction
In this tutorial, we will guide you through the steps to take the BodyPoseNet Model from the NVIDIA NGC hub, convert the ONNX model to TFLite format, quantize the converted models, and execute them on the TFLite benchmark model. We will demonstrate two conversion methods: using the open-source onnx2tf tool and MediaTek’s proprietary mtk-converter. Finally, we will showcase a demo video of an application created using the same model.
Prerequisites
Before you begin, ensure you have the following installed:
Python 3.11
ONNX v1.13
onnxruntime v1.17.1
TensorFlow v2.14
onnx2tf
Neuropilot Converter 8.10
NVIDIA TAO Toolkit (v5.5)
Download the Model
First, download the BodyPoseNet model from the NVIDIA NGC hub.
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/bodyposenet/deployable_onnx_v1.0.1/files?redirect=true&path=model.onnx' -O bodyposenet.onnx
Model Analysis
This downloaded model has three dynamic inputs, which need to be made static for conversion. Below is a visualization of the model using Netron.
This model has dynamic inputs (batch_size, width, height) and dynamic outputs. To make the inputs static, you can use the following onnxruntime command:
python3 -m onnxruntime.tools.make_dynamic_shape_fixed --input_name input_1:0 --input_shape 1,224,320,3 bodyposenet.onnx bodyposenet_fixed.onnx
This will fix the input shape of the ONNX model to [1,224,320,3].
You can now see that the dynamic inputs have become static.
Convert ONNX Model to TFLite
We will demonstrate two methods to convert the ONNX model to TFLite format.
Method 1: Using onnx2tf
onnx2tf is an open-source tool that converts ONNX models to TensorFlow Lite models. It is designed to be easy to use and supports a wide range of ONNX operators. This method is suitable for users who prefer open-source solutions and want to leverage the flexibility of TensorFlow Lite.
Install onnx2tf:
pip3 install onnx2tf pip3 install sor4onnx
Convert the Model to TF:
Note that the ONNX model already has the input shape in NHWC format. onnx2tf by default expects the input shape to be NCHW and will introduce a transpose operation to the input to convert it to NHWC. The steps below ensure that the generated TFLite input shape format remains NHWC. We start by removing the
:0
suffix from the input since TensorFlow introduces its own suffixes.sor4onnx --input_onnx_file_path bodyposenet_fixed.onnx --old_new ":0" "" --mode full --search_mode suffix_match --output_onnx_file_path bodyposenet_renamed.onnx
You can now see that the input name has changed from
input_1:0
toinput_1
.Now we convert the ONNX model to TensorFlow saved_model using onnx2tf.
onnx2tf -i bodyposenet_renamed.onnx -ois input_1:1,224,320,3 -kat input_1
This generates the TensorFlow saved_model.
Convert and Quantize the Model to TFLite:
We can use the generated TF saved_model and convert it to TFLite using the TFLite converter.
import tensorflow as tf import numpy as np tf_model_path = '/path/to/saved_model' tflite_model_path = 'bodyposenet.tflite' # Generate representative dataset def representative_dataset(): data = tf.random.uniform((1,224,320,3)) yield [data] converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.int8 # Can be tf.uint8, tf.float32, or tf.float16 converter.inference_output_type = tf.float32 # Can be tf.uint8, tf.int8, or tf.float16. We keep it float32 for ease of post-processing output data tflite_model = converter.convert() with open(tflite_model_path, "wb") as f: f.write(tflite_model)
The Netron visualization of the generated TFLite model is below:
Method 2: Using mtk-converter
mtk-converter is MediaTek’s proprietary tool for converting ONNX models to TFLite format. It is optimized for MediaTek’s hardware and provides additional features for model optimization and deployment. This method is recommended for users who are deploying models on MediaTek platforms and want to take advantage of MediaTek’s AI-ML solutions.
Install mtk-converter
Follow the instructions to download and install the mtk-converter tool from the official NeuroPilot documentation. Install the mtk-converter tool inside a Python v3.11 virtual environment.
# Steps to start a Python virtual environment and install the mtk_converter wheel file in your Linux environment sudo apt install python3.11-venv mkdir NP8 python3.11 -m venv NP8/ source NP8/bin/activate # This starts your virtual environment pip install mtk_converter*.whl # Install the compatible mtk-converter wheel file. It would be part of the downloaded NP package. pip install <required_dependencies> # Install the dependencies as mentioned in the Neuropilot documentation
Convert ONNX Model to TFLite
The ONNX model already has NHWC shapes. We can use the below Python script to convert the ONNX model to TFLite.
The provided Python script performs two functions:
Convert ONNX IR version to v8 supported by NP8
Convert the generated ONNX model to TFLite
import onnx from onnx import version_converter import mtk_converter import numpy as np import argparse # Set up argument parser parser = argparse.ArgumentParser(description='Convert ONNX model to TFLite with quantization') parser.add_argument('input_model', type=str, help='Path to the input ONNX model') parser.add_argument('output_model', type=str, help='Path to the output TFLite model') parser.add_argument('--input_names', type=str, default=None, help='Comma-separated list of input tensor names') parser.add_argument('--input_shapes', type=str, default=None, help='Semicolon-separated list of comma-separated input tensor shapes, e.g., "1,224,224,3;1,10"') parser.add_argument('--quantize', type=bool, default=False, help='Set to true if you want a quantized model. Default is False') args = parser.parse_args() input_model_path = args.input_model output_model_path = args.output_model quantize = args.quantize input_names = args.input_names.split(',') if args.input_names else None input_shapes = [list(map(int, shape.split(','))) for shape in args.input_shapes.split(';')] if args.input_shapes else None # Step 1: Convert the IR version of the ONNX model def convert_ir_version(input_model_path, target_ir_version=8): original_model = onnx.load(input_model_path) print(f"Original model IR version: {original_model.ir_version}") # Convert the IR version original_model.ir_version = target_ir_version output_model_path = 'converted_model.onnx' onnx.save(original_model, output_model_path) converted_model = onnx.load(output_model_path) print(f"Converted model IR version: {converted_model.ir_version}") return output_model_path # Step 2: Convert the ONNX model to TFLite def convert_to_tflite(input_model_path, output_model_path, input_shapes, input_names): # Load the ONNX model to get the input shape onnx_model = onnx.load(input_model_path) if input_shapes is None: input_shapes = [] for input_tensor in onnx_model.graph.input: input_shape = [dim.dim_value for dim in input_tensor.type.tensor_type.shape.dim] input_shapes.append(input_shape) # Define representative dataset def data_gen(input_shapes): for i in range(10): yield [np.random.randn(*shape).astype(np.float32) for shape in input_shapes] # Create the converter if args.input_names or args.input_shapes: converter = mtk_converter.OnnxConverter.from_model_proto_file(input_model_path, input_names, input_shapes) else: converter = mtk_converter.OnnxConverter.from_model_proto_file(input_model_path) if quantize: converter.quantize = True converter.calibration_data_gen = lambda: data_gen(input_shapes) # Uncomment to set precision proportion to INT8 # converter.precision_proportion = {'8W8A': 1.0} # Uncomment with precision proportion to set input data type as FP32 # converter.prepend_input_quantize_ops = True # Uncomment with precision proportion to set output data type as FP32 # converter.append_output_dequantize_ops = True converter.use_per_output_channel_quantization = False # Convert the model to TFLite _ = converter.convert_to_tflite(output_file=output_model_path) # Convert the IR version of the ONNX model converted_model_path = convert_ir_version(input_model_path) # Convert the ONNX model to TFLite convert_to_tflite(converted_model_path, output_model_path, input_shapes, input_names)
The provided conversion_script.py will convert your ONNX model to TFLite. Use the below command line arguments to execute the script:
python3 conversion_script.py bodyposenet.onnx bodyposenet.tflite --input_names input_1:0 --input_shapes 1,224,320,3 --quantize=True
Set the
--quantize
flag to TRUE to quantize the TFLite model.
Model Execution
Use the TFLite benchmark model tool to execute the quantized model and measure its performance.
benchmark_model is provided in Tensorflow Performance Measurement for performance evaluation.
Commands for executing the benchmark tool with CPU and different delegates are as follows.
Execute on CPU (8 threads):
benchmark_model --graph=bodyposenet.tflite --num_threads=8 --num_runs=10
Execute on GPU, with GPU delegate:
benchmark_model --graph=bodyposenet.tflite --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --num_runs=10
Execute on GPU, with Arm NN delegate:
benchmark_model --graph=bodyposenet.tflite --external_delegate_path=/usr/lib/libarmnnDelegate.so.29 --external_delegate_options="backends:GpuAcc,CpuAcc" --num_runs=10
Execute on NPU, with NPU LiteRT delegate:
benchmark_model --graph=bodyposenet.tflite --stable_delegate_settings_file=/usr/share/label_image/--stable_delegate_settings.json
The following table shows the inference times (in milliseconds) for the converted bodyposenet TFLite model running on the Genio 700 platform.
Model |
CPU (1 thread) |
GPU |
NPU LiteRT delegate |
---|---|---|---|
576.2 |
294.3 |
21 |
Demo Example
Application Control Flow Stack Diagram
Below we show the Application Control Flow of the demo example. The example makes use of our advanced MDP, GPU, and NPU capabilities.
Application Demo Video
Below are a few video examples of body pose estimation using BodyPoseNet
Conclusion
In this tutorial, we demonstrated how to take the BodyPoseNet from the NVIDIA NGC hub, convert it to TFLite format using two different methods, quantize the model, and execute it on the TFLite benchmark model. We also showcased a demo video of an application created using the same model. This process highlights the flexibility and efficiency of using the NVIDIA TAO Toolkit and MediaTek’s Neuropilot solutions for deploying AI-ML models at the edge.