NPU Acceleration

Introduction

MediaTek provides advanced AI capabilities through its state-of-the-art NPU. TFLite models can access these advanced NPU capabilities by leveraging MediaTek’s LiteRT Neuron Delegate, which is built upon the LiteRT Stable Delegate API.

The stable delegate provider in TensorFlow Lite (LiteRT) offers a TfLiteOpaqueDelegate object pointer and its corresponding deleter by loading a dynamic library that encapsulates the actual LiteRT delegate implementation in a TfLiteStableDelegate struct instance.

Stable delegates are designed to work with shared object files that support ABI backward compatibility. This means that the delegate and the TFLite runtime do not need to be built using the exact same version of TFLite as the application. However, it is important to note that this is a work in progress, and ABI stability is not yet guaranteed.

For more information on the stable delegate provider, refer to Stable Delegate Provider.

Integrating Stable Delegate into LiteRT Application

Code Snippet

Below is a C++ code snippet demonstrating how to integrate MediaTek Neuron LiteRT Delegate into LiteRT application.

#include <iostream>
#include <tensorflow/lite/interpreter.h>
#include <tensorflow/lite/kernels/register.h>
#include <tensorflow/lite/model.h>
#include <tensorflow/lite/optional_debug_tools.h>
#include "tensorflow/lite/core/c/c_api.h"
#include "tensorflow/lite/delegates/utils/experimental/stable_delegate/delegate_loader.h"
#include "tensorflow/lite/delegates/utils/experimental/stable_delegate/tflite_settings_json_parser.h"
#include "tensorflow/lite/interpreter_builder.h"
#include "absl/cleanup/cleanup.h"
#include <stdexcept>
#include <string>

void IntegrateStableDelegate(const std::string& model_path, int num_threads) {
    // Load the TFLite model
    auto model = tflite::FlatBufferModel::BuildFromFile(model_path.c_str());
    if (!model) {
        throw std::runtime_error("Failed to load model");
    }

    // Setup Stable delegate
    using tflite::delegates::utils::LoadDelegateFromSharedLibrary;
    using tflite::delegates::utils::TfLiteSettingsJsonParser;
    constexpr char kSampleDelegatePath[] = "/usr/lib/libneuron_stable_delegate.so"; // Path to stable_delegate.so (included in yocto image)
    constexpr char kSettingsPath[] = "/usr/share/label_image/stable_delegate_settings.json"; // Path to stable_delegate_settings.json (included in yocto image)

    // Load stable delegate
    const TfLiteStableDelegate* stable_delegate = LoadDelegateFromSharedLibrary(kSampleDelegatePath);
    if (stable_delegate == nullptr || stable_delegate->delegate_plugin == nullptr) {
        throw std::runtime_error("Failed to load stable delegate from library");
    }

    // Load settings
    TfLiteSettingsJsonParser parser;
    const tflite::TFLiteSettings* settings = parser.Parse(kSettingsPath);
    if (settings == nullptr) {
        throw std::runtime_error("Failed to load JSON settings");
    }

    // Create opaque delegate
    TfLiteOpaqueDelegate* opaque_delegate = stable_delegate->delegate_plugin->create(settings);
    if (opaque_delegate == nullptr) {
        throw std::runtime_error("Failed to create opaque delegate");
    }
    absl::Cleanup destroy_opaque_delegate = [&] {
        stable_delegate->delegate_plugin->destroy(opaque_delegate);
    };

    // Build the interpreter
    tflite::ops::builtin::BuiltinOpResolver resolver;
    std::unique_ptr<tflite::Interpreter> interpreter;
    tflite::InterpreterBuilder(*model, resolver)(&interpreter);
    if (!interpreter) {
        throw std::runtime_error("Failed to create interpreter");
    }

    // Add delegate to the interpreter
    if (interpreter->ModifyGraphWithDelegate(opaque_delegate) != kTfLiteOk) {
        throw std::runtime_error("Failed to modify graph with opaque delegate");
    }

    // Set number of threads
    interpreter->SetNumThreads(num_threads);

    // Allocate tensor buffers
    if (interpreter->AllocateTensors() != kTfLiteOk) {
        throw std::runtime_error("Failed to allocate tensors");
    }

    // The interpreter is now ready to run inference
}

Explanation

  1. Loading the TFLite Model:
    • The model is loaded from a file using tflite::FlatBufferModel::BuildFromFile.

  2. Setting Up the Stable Delegate:
    • The stable delegate is loaded from a shared library using LoadDelegateFromSharedLibrary.

    • The settings for the delegate are loaded from a JSON file using TfLiteSettingsJsonParser.

  3. Creating the Opaque Delegate:
    • The opaque delegate is created using the create method of the delegate plugin.

    • A cleanup function is set up to destroy the opaque delegate when it is no longer needed.

  4. Building the Interpreter:
    • The interpreter is built using tflite::InterpreterBuilder.

    • The opaque delegate is added to the interpreter using ModifyGraphWithDelegate.

  5. Configuring the Interpreter:
    • The number of threads is set using SetNumThreads.

    • Tensor buffers are allocated using AllocateTensors.