Model Converter

Many deep learning framworks (i.e. TensorflowV1, PyTorch, Caffe..) provide convert tool that can convert models from different deep learning training frameworks into a format that can be deployed on IoT Yocto. Here we mean that NeuronSDK can only accept quantized Tensorflow-Lite model as input.

Converter tool handles the variations of both the operator definitions and model representations among different training frameworks, and provides device-independent optimizations to the given model. In this section, we provide some examples of using TensorFlow v2’s Converter Tool. TensorflowV2 Converter Tool is also capable of quantizing the model with different configurations, such as 8-bit asymmetric quantization, 16-bit symmetric quantization, or mixed-bit quantization. Post-training quantization can be applied during the conversion process if necessary.

We will also demonstrate how to use ncc-tflite to convert quantized model to DLA model.

Note

No matter what deep-learning-framework model (i.e. TensorflowV1, PyTorch, Caffe..) user chooses as source, as long as the converted model is a quantized Tensorflow-Lite model, it is the valid input of neuron SDK. Users need to have knowledge about the model conversion method of the corresponding framework, which will not be described in this section.

Note

TensorFlow is tested and supported on the following 64-bit systems:

  • Python 3.7–3.10

  • Ubuntu 16.04 or later

  • Windows 7 or later (with C++ redistributable),

  • macOS 10.12.6 (Sierra) or later (no GPU support)

  • WSL2 via Windows 10 19044 or higher including GPUs (Experimental).

Please refer Install Page for details.

Convert TensorFlow V2 Flaot Model to Quant Model

Model Preparation

The following directories are created for this example.

$ mkdir workspace/float workspace/qat workspace/ptq workspace/ptq_dynamic

We first train the float-point model based on the MNIST dataset.

import tensorflow as tf

(train_images, train_labels), (_, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images / 255.0

model = tf.keras.Sequential(
        [
                tf.keras.layers.InputLayer(input_shape=(28, 28)),
                tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
                tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
                tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
                tf.keras.layers.Flatten(),
                tf.keras.layers.Dense(10)
        ]
)
model.compile(
        optimizer='adam',
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=['accuracy']
)
model.fit(train_images, train_labels, epochs=10)
model.save('./workspace/float/model', save_format='tf')

Next, we do quantization-aware training based on the above floating-point model.

import tensorflow as tf
import tensorflow_model_optimization as tfmot

(train_images, train_labels), (_, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images / 255.0
train_images_subset = train_images[0:1000]
train_labels_subset = train_labels[0:1000]

model = tf.keras.models.load_model('./workspace/float/model')
q_aware_model = tfmot.quantization.keras.quantize_model(model)
q_aware_model.compile(
        optimizer='adam',
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=['accuracy']
)
q_aware_model.fit(
        train_images_subset,
        train_labels_subset,
        batch_size=500,
        epochs=1,
        validation_split=0.1
)
q_aware_model.save('./workspace/qat/model', save_format='tf')

Floating-Point

Convert the model using the command-line executable.

$ tflite_convert                                            \
        --input_saved_model_dir=workspace/float/model       \
        --output_file=workspace/float/model.tflite          \
        --default_batch_size=1

Or convert the model using the Python API.

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model_dir(
        './workspace/float/model', default_batch_size=1
)
_ = converter.convert_to_tflite(output_file='./workspace/float/model.tflite')

The output TFLite model file is stored as ./workspace/float/model.tflite.

Note

We set the default_batch_size argument, because a dynamic batch size is used by default in tf.keras.layers.InputLayer.

Quantization-Aware Training

Convert the above fake-quantized model using the command-line executable.

$ tflite_convert                                          \
        --input_saved_model_dir=workspace/qat/model       \
        --output_file=workspace/qat/model.tflite          \
        --default_batch_size=1                            \
        --quantize=True

Or convert the model using the Python API.

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model_dir(
        './workspace/qat/model', default_batch_size=1
)
converter.quantize = True
_ = converter.convert_to_tflite(output_file='./workspace/qat/model.tflite')

The output TFLite model file is stored as ./workspace/qat/model.tflite.

Note

We set the default_batch_size argument, because a dynamic batch size is used by default in tf.keras.layers.InputLayer.

Note

The input quantization range was already deduced by the quantization-aware training process, so we do not set the input_value_ranges option.

Post-Training Quantization

Convert the above floating-point model to TFLite with post-training quantization.

We first store the data used for post-training quantization.

import os
import tensorflow as tf
import numpy as np

(train_images, _), (_, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images / 255.0

os.mkdir('./workspace/ptq/data')
for i in range(100):
        batch_data = train_images[i:i+1].astype(np.float32)
        np.save('./workspace/ptq/data/batch_{}.npy'.format(i), batch_data)

Convert the model using the command-line executable.

$ tflite_convert                                              \
        --input_saved_model_dir=workspace/float/model     \
        --output_file=workspace/ptq/model.tflite          \
        --default_batch_size=1                            \
        --calibration_data_dir=workspace/ptq/data         \
        --calibration_data_regexp=batch_.*\.npy           \
        --input_value_ranges=0,1                          \
        --quantize=True

Or convert the model using the Python API.

import tensorflow as tf
import numpy as np

(train_images, _), (_, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images / 255.0

def data_gen():
        for i in range(100):
                batch_data = train_images[i:i+1].astype(np.float32)
                yield [batch_data]

converter = tf.lite.TFLiteConverter.from_saved_model_dir(
        './workspace/float/model', default_batch_size=1
)
converter.quantize = True
converter.input_value_ranges=[(0.0, 1.0)]
converter.calibration_data_gen = data_gen
_ = converter.convert_to_tflite(output_file='./workspace/ptq/model.tflite')

The output TFLite model file is stored as ./workspace/ptq/model.tflite.

Note

We set the default_batch_size argument, because a dynamic batch size is used by default in tf.keras.layers.InputLayer.

Note

The input_value_ranges argument value depends on the actual dataset distribution. If not provided, the input value ranges will be deduced from the given calibration dataset.

Convert Quant Model to Neuron DLA Model

A DLA file is a MediaTek proprietary model which a low-level binary for MDLA and VPU compute devices.

Basic commands for using ncc-tflite to convert TFLite model to DLA file that can be inference on the APU:

$ ncc-tflite -arch mdla2.0,vpu ./workspace/ptq/model.tflite  -o ./workspace/ptq/model.dla

The output DLA model file is stored as ./workspace/ptq/model.dla.

Note

For the details of ncc-tflite, please refer to Neuron Compiler section.