.. include:: /keyword.rst

====================
Demo App: NNStreamer
====================

.. contents:: Sections
    :local:
    :depth: 2

Overview
========

`NNStreamer <https://nnstreamer.ai/>`_ is an open-source collection of GStreamer plugins that simplifies the integration of neural networks into multimedia pipelines. Samsung initially developed the project before transferring it to the LF AI & Data Foundation.

NNStreamer allows developers to:

*   Integrate neural network models into GStreamer pipelines efficiently.
*   Manage neural network filters and data streams within a unified framework.
*   Incorporate custom C/C++ or Python objects and various AI frameworks at runtime.

For comprehensive details, refer to the `NNStreamer Official Documentation <https://nnstreamer.github.io/index.html>`_.

|IOT-YOCTO| includes a specialized ``tensor_filter`` subplugin designed for the Neuron SDK. Developers use ``tensor_filter_neuronsdk`` to build pipelines that leverage Genio hardware accelerators, such as the MDLA. The source implementation is located in the |IOT-YOCTO| NNStreamer tree at ``ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc``.

The following figure shows the software stack for NNStreamer on |IOT-YOCTO|.

.. image:: /_asset/tools_nnstreamer_software-stack.png
    :width: 1000


NNStreamer on IOT Yocto
=======================

The machine learning software stack on |IOT-YOCTO| provides multiple backend and accelerator options.
Developers can run inference with the online Neuron Stable Delegate on MediaTek’s AI Processing Unit (NPU).

.. csv-table:: Table 2. Software Stack on |IOT-YOCTO|
    :class: longtable
    :file: /_asset/tables/ml-platform-sw-stack.csv
    :width: 65%
    :widths: 140 100 100 100 100 100 100 100

NNStreamer::tensor_filter
-------------------------

The NNStreamer plugin `tensor_filter <https://nnstreamer.github.io/gst/nnstreamer/tensor_filter/README.html>`_ plays a central role in NNStreamer.
It acts as a bridge between GStreamer data streams and neural network frameworks, such as
`TensorFlow Lite <https://github.com/nnstreamer/nnstreamer/blob/main/ext/nnstreamer/tensor_filter/tensor_filter_tensorflow_lite.cc>`_.
It converts GStreamer buffers to the format expected by neural networks and executes model inference.

Like a typical GStreamer plugin, the ``gst-inspect-1.0`` command shows the details of the ``tensor_filter`` element:

.. prompt:: bash # auto

    # gst-inspect-1.0 tensor_filter
    ...
            Pad Templates:
        SINK template: 'sink'
            Availability: Always
            Capabilities:
            other/tensor
                    framerate: [ 0/1, 2147483647/1 ]
            other/tensors
                        format: { (string)static, (string)flexible }
                    framerate: [ 0/1, 2147483647/1 ]

        SRC template: 'src'
            Availability: Always
            Capabilities:
            other/tensor
                    framerate: [ 0/1, 2147483647/1 ]
            other/tensors
                        format: { (string)static, (string)flexible }
                    framerate: [ 0/1, 2147483647/1 ]

        Element has no clocking capabilities.
        Element has no URI handling capabilities.

        Pads:
        SINK: 'sink'
            Pad Template: 'sink'
        SRC: 'src'
            Pad Template: 'src'

        Element Properties:
        accelerator         : Set accelerator for the subplugin with format (true/false):(comma separated ACCELERATOR(s)). true/false determines if accelerator is to be used. list of accelerators determines the backend (ignored with false). Example, if GPU, NPU can be used but not CPU - true:npu,gpu,!cpu. The full list of accelerators can be found in nnstreamer_plugin_api_filter.h. Note that only a few subplugins support this property.
                                flags: readable, writable
                                String. Default: ""
        custom              : Custom properties for subplugins ?
                                flags: readable, writable
                                String. Default: ""
        framework           : Neural network framework
                                flags: readable, writable
                                String. Default: "auto"
        input               : Input tensor dimension from inner array, up to 4 dimensions ?
                                flags: readable, writable
                                String. Default: ""
        input-combination   : Select the input tensor(s) to invoke the models
                                flags: readable, writable
                                String. Default: ""
        inputlayout         : Set channel first (NCHW) or channel last layout (NHWC) or None for input data. Layout of the data can be any or NHWC or NCHW or none for now.
                                flags: readable, writable
                                String. Default: ""
        inputname           : The Name of Input Tensor
                                flags: readable, writable
                                String. Default: ""
        inputranks          : The Rank of the Input Tensor, which is separated with ',' in case of multiple Tensors
                                flags: readable
                                String. Default: ""
        inputtype           : Type of each element of the input tensor ?

    ...


TensorFlow Lite Framework
-------------------------

Developers can construct GStreamer pipelines by using the existing `tensor_filter_tensorflow_lite <https://github.com/nnstreamer/nnstreamer/blob/main/ext/nnstreamer/tensor_filter/tensor_filter_tensorflow_lite.cc>`_ subplugin.
Examples using the TensorFlow Lite framework are available in `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.

When ``tensor_filter_tensorflow_lite`` is used, properties such as the ``framework`` (neural network framework) and ``model`` (model path) must be set.
However, developers do not need to specify model metadata such as **input/output type** and **input/output dimension**, because ``tensor_filter_tensorflow_lite`` reads this information directly from the TFLite model file.

The following snippet shows a ``tensor_filter`` configured to use the TensorFlow Lite framework.
For full pipeline examples, refer to `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.

.. prompt:: bash # auto

    ... tensor_converter ! \
    tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8 ! \
    ...


.. _neuron_framework:

Neuron Framework
----------------

|IOT-YOCTO| provides a ``tensor_filter`` subplugin that supports :ref:`Neuron SDK <Neuron SDK>`.
Developers can use ``tensor_filter_neuronsdk`` to create GStreamer pipelines that leverage the Genio platform AI accelerators.
The source implementation is located in the |IOT-YOCTO| NNStreamer repository:

``$BUILD_DIR/tmp/work/armv8a-poky-linux/nnstreamer/$PV/git/ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc``

.. _tensor_filter_neuronsdk:

In contrast to the TensorFlow Lite framework, all model-related properties, including the neural network framework, model path, **input/output type**, and **input/output dimension**, must be provided explicitly when using ``tensor_filter_neuronsdk``.
For security reasons, the model information is embedded in the DLA file and is not exposed by the runtime.
Therefore, it is important that developers fully understand the input and output specifications of their models.

The following snippet shows a ``tensor_filter`` configured to use Neuron SDK:

.. prompt:: bash # auto

    ...  tensor_converter ! \
    tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
    ...

.. note::

    The main ``tensor_filter`` properties related to tensor type and dimension are:

    - `inputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_: Type of each element of the input tensor.

    - `inputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L938>`_: Channel-first (NCHW), channel-last (NHWC), or none for input data.

    - `input <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L926>`_: Input tensor dimension, up to 4 dimensions.

    - `outputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L955>`_: Type of each element of the output tensor.

    - `outputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L959>`_: Channel-first (NCHW), channel-last (NHWC), or none for output data.

    - `output <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L951>`_: Output tensor dimension, up to 4 dimensions.

    For more details, refer to the `NNStreamer online documentation <https://nnstreamer.github.io/gst/nnstreamer/tensor_filter/README.html>`_ and the
    `tensor_filter common source code <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_.


NNStreamer Unit Test
====================

NNStreamer provides a `gtest-based test suite <https://github.com/nnstreamer/nnstreamer/tree/main/tests>`_ for the common library and NNStreamer plugins.
Running these unit tests helps verify the integration status of NNStreamer on |IOT-YOCTO|.

.. prompt:: bash # auto

    # cd /usr/bin/unittest-nnstreamer/
    # ssat
    ...
    ==================================================

    [PASSED] transform_typecast (37 passed among 39 cases)
    [PASSED] nnstreamer_filter_neuronsdk (8 passed among 8 cases)
    [PASSED] transform_dimchg (13 passed among 13 cases)
    [PASSED] nnstreamer_decoder_pose (3 passed among 3 cases)
    [PASSED] nnstreamer_decoder_boundingbox (15 passed among 15 cases)
    [PASSED] transform_clamp (10 passed among 10 cases)
    [PASSED] transform_stand (9 passed among 9 cases)
    [PASSED] transform_arithmetic (36 passed among 36 cases)
    [PASSED] nnstreamer_decoder (17 passed among 17 cases)
    [PASSED] nnstreamer_filter_custom (23 passed among 23 cases)
    [PASSED] transform_transpose (16 passed among 16 cases)
    [PASSED] nnstreamer_filter_tensorflow2_lite (31 passed among 31 cases)
    [PASSED] nnstreamer_repo_rnn (2 passed among 2 cases)
    [PASSED] nnstreamer_converter (32 passed among 32 cases)
    [PASSED] nnstreamer_repo_dynamicity (10 passed among 10 cases)
    [PASSED] nnstreamer_mux (84 passed among 84 cases)
    [PASSED] nnstreamer_split (21 passed among 21 cases)
    [PASSED] nnstreamer_repo (77 passed among 77 cases)
    [PASSED] nnstreamer_demux (43 passed among 43 cases)
    [PASSED] nnstreamer_filter_python3 (0 passed among 0 cases)
    [PASSED] nnstreamer_rate (17 passed among 17 cases)
    [PASSED] nnstreamer_repo_lstm (2 passed among 2 cases)
    ==================================================
    [PASSED] All Test Groups (23) Passed!
            TC Passed: 595 / Failed: 0 / Ignored: 2

Some test cases are marked as "Ignored" because they do not implement the ``runTest.sh`` script in their test directory, which is required by ``ssat``.
Even when ``ssat`` ignores a test group, the integration status can still be checked by running the individual unit test binary.

The following example shows how to run the Arm NN unit test (for reference):

.. prompt:: bash # auto

    # cd /usr/bin/unittest-nnstreamer/tests/
    # export NNSTREAMER_SOURCE_ROOT_PATH=/usr/bin/unittest-nnstreamer/
    # ./unittest_filter_armnn
    ...
    [==========] 13 tests from 1 test suite ran. (141 ms total)
    [  PASSED  ] 13 tests.


NNStreamer Pipeline Examples
============================

|IOT-YOCTO| provides several Python examples in ``/usr/bin/nnstreamer-demo/`` to demonstrate how to build NNStreamer pipelines with different ``tensor_filter`` configurations for various use cases.
These examples are adapted from `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.

.. csv-table:: Table Features of NNStreamer Examples
    :class: longtable
    :file: /_asset/tables/ml-nnstreamer-demo.csv
    :width: 65%
    :widths: 250 100 400

Each application can be run directly via its own Python script.
However, |IOT-YOCTO| strongly recommends launching them through the demo runner ``run_nnstreamer_example.py``.
The demo runner allows developers to switch between applications and frameworks by changing command-line arguments instead of manually constructing GStreamer commands.

The remainder of this section uses ``run_nnstreamer_example.py`` to walk through the demo flow.
Use ``--help`` to list all available options:

.. prompt:: bash # auto

    # python3 run_nnstreamer_example.py --help
    usage: run_nnstreamer_example.py [-h] [--app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement,monocular_depth_estimation}]
                                 [--engine {neuronsdk,neuron_stable}] [--img IMG] [--cam CAM] --cam_type {uvc,yuvsensor,rawsensor} [--width WIDTH] [--height HEIGHT] [--performance {0,1}]
                                 [--fullscreen {0,1}] [--throughput {0,1}] [--rot ROT]

    options:
    -h, --help            show this help message and exit
    --app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement,monocular_depth_estimation}
                        Choose a demo app to run. Default: image_classification
    --engine {neuronsdk,neuron_stable}
                        Choose a runtime engine to run the pipeline.
                        If no engine is specified, the inference will run on CPU by default.
                        Note: neuron_stable is NOT available on Genio-350

    --img IMG           Input image file path.
                        Example: /usr/bin/nnstreamer-demo/original.png
                        Note: This parameter is dedicated to the low light enhancement app.
    --cam CAM           Input camera node ID, for example: 130.
                        Use 'v4l2-ctl --list-devices' to query the camera node ID.
                        Example:
                        $ v4l2-ctl --list-devices
                            ...
                            C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
                            /dev/video130
                            /dev/video131
                            ...
                        Note: This parameter applies to all apps except the low light enhancement app.
    --cam_type {uvc,yuvsensor,rawsensor}
                        Choose the camera type for the demo, for example: yuvsensor.
                        Note: This parameter applies to all apps except the low light enhancement app.
    --width WIDTH       Width of the preview window, for example: 640
    --height HEIGHT     Height of the preview window, for example: 480
    --performance {0,1} Enable performance mode for CPU/GPU/APU, for example: 1
    --fullscreen {0,1}  Fullscreen preview.
                        1: Enable
                        0: Disable
                        Note: This parameter applies to all apps except the low light enhancement app.
    --throughput {0,1}  Print throughput information.
                        1: Enable
                        0: Disable
    --rot ROT           Rotate the camera image by degrees, for example: 90
                        Note: This parameter applies to all apps except the low light enhancement app.


Here are some key options:

- ``--engine``:

    Select the runtime engine used for inference. It can be:

    - ``neuronsdk``: Offline inference on APU with compiled DLA models.
    - ``neuron_stable``: Online inference path using the Neuron Stable Delegate.

    For each Python demo script, a ``build_pipeline`` function constructs a ``tensor_filter`` element with the appropriate framework, engine, and properties based on the selected options.

    .. important::

        The Neuron Stable Delegate provides online inference path support and can route inference to different hardware accelerators, with a fallback mechanism.
        The offline inference path using ``neuronsdk`` runs compiled models directly on the APU.

    The following examples show typical pipelines constructed by the demos:

    - ``--engine cpu`` (implicit default when no engine is specified):

        .. prompt:: text # auto

            # If no hardware engine is specified, the inference runs on CPU
            tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8

    - ``--engine neuron_stable`` (Neuron Stable Delegate):

        .. prompt:: text # auto

            tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:Stable,StaDelegateSettingFile:/usr/share/label_image/stable_delegate_settings.json,ExtDelegateKeyVal:backends#GpuAcc

    - ``--engine neuronsdk`` (offline Neuron SDK):

        The details of the framework are described in :ref:`Neuron Framework <neuron_framework>`.

        .. prompt:: text # auto

            tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1

- ``--cam``:

    Specifies the camera node index used as input.

- ``--performance``:

    Sets the performance mode for the platform:

    - ``--performance 0``: Disable performance mode.
    - ``--performance 1``: Enable performance mode.

    Performance mode drives CPU, GPU, and APU to their highest operating frequencies and disables thermal throttling.


Camera-Input Application
------------------------

A v4l2-compatible device is required as an input source for the following demonstrations.

General Configuration
^^^^^^^^^^^^^^^^^^^^^

The camera-based examples share common configuration parameters.
Developers can switch applications by changing only the application option while keeping the shared settings.

The following example uses a USB webcam.
Here uses `v4l2-ctl` o obtain the **camera node ID**.

.. prompt:: bash # auto

    # v4l2-ctl --list-devices
        ...
        C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
        /dev/video130
        /dev/video131
        ...

In this case, the camera node ID is ``/dev/video130``.

The common settings for a UVC camera with Performance Mode enabled are:

.. prompt:: bash # auto

    # CAM_TYPE=uvc
    # CAMERA_NODE_ID=130
    # MODE=1

.. note::

   Developers can also use a raw sensor or YUV sensor as the input source by assigning ``CAM_TYPE``, for example ``CAM_TYPE=rawsensor`` or ``CAM_TYPE=yuvsensor``.


Image Classification
^^^^^^^^^^^^^^^^^^^^^

.. image:: /_asset/tools_nnstreamer_examples_image_classification.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_image_classification.py``
- Model: `mobilenet_v1_1.0_224_quant.tflite <https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_1.0_224_quant_and_labels.zip>`_
- Run example:

    1. Set the variable ``APP`` to the Image Classification application:

        .. prompt:: bash # auto

            # APP=image_classification

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate (if supported on the platform)**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            If no engine is set, the demo falls back to CPU execution.

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuronsdk --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_image_classification` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-image-classification-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_image_classification.py`` when ``--cam uvc`` and ``--engine neuronsdk`` are used.
    The pipeline graph is generated using the ``gst-report`` command from the ``gst-instruments`` tool.
    For more information, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue ! textoverlay name=tensor_res font-desc=Sans,24 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=224,height=224,format=RGB ! tensor_converter ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
        tensor_sink name=tensor_sink

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_image_classification.svg
        :width: 1000


Object Detection
^^^^^^^^^^^^^^^^

`ssd_mobilenet_v2_coco`
"""""""""""""""""""""""

.. image:: /_asset/tools_nnstreamer_examples_object_detection.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_object_detection.py``
- Model: `ssd_mobilenet_v2_coco.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/ssd_mobilenet_v2_coco>`_
- Run example:

    1. Set the variable ``APP`` to the Object Detection application:

        .. prompt:: bash # auto

            # APP=object_detection

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuronsdk --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_object_detection` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-object-detection-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_object_detections.py`` with ``--cam uvc`` and ``--engine neuronsdk``.
    The pipeline graph is generated using the ``gst-report`` command from ``gst-instruments``.
    For more details, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! v4l2convert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/ssd_mobilenet_v2_coco.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,91:1917:1 ! \
        tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/usr/bin/nnstreamer-demo/coco_labels_list.txt option3=/usr/bin/nnstreamer-demo/box_priors.txt option4=640:480 option5=300:300 ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection.svg
        :width: 1000


YOLOv5
""""""

.. image:: /_asset/tools_nnstreamer_examples_object_detection_yolov5.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5.py``
- Model: `yolov5s-int8.tflite <https://github.com/ultralytics/yolov5>`_
- Run example:

    1. Set the variable ``APP`` to the Object Detection (YOLOv5s) application:

        .. prompt:: bash # auto

            # APP=object_detection_yolov5

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

        .. note::

            For offline inference, the YOLOv5 model is only supported on MDLA3.0 (Genio-700/510).
            On MDLA2.0 (Genio-1200), model conversion fails because certain operations are not supported.

            .. prompt:: bash # auto

                # ncc-tflite --arch mdla2.0 yolov5s-int8.tflite -o yolov5s-int8.dla --int8-to-uint8
                OP[123]: RESIZE_NEAREST_NEIGHBOR
                ├ MDLA: HalfPixelCenters is unsupported.
                ├ EDMA: unsupported operation
                OP[145]: RESIZE_NEAREST_NEIGHBOR
                ├ MDLA: HalfPixelCenters is unsupported.
                ├ EDMA: unsupported operation
                ERROR: Cannot find an execution plan because of unsupported operations
                ERROR: Fail to compile yolov5s-int8.tflite

            As a result, running ``run_nnstreamer_example.py --app object_detection_yolov5`` with ``--engine neuronsdk`` fails on Genio-1200:

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app object_detection_yolov5 --cam_type uvc --cam 130 --engine neuronsdk --performance 1
                ...
                ERROR: Cannot open the file: /usr/bin/nnstreamer-demo/yolov5s-int8.dla
                ERROR: Cannot set a nullptr compiled network.
                ERROR: Cannot set compiled network.
                ERROR: Runtime loadNetworkFromFile fails.
                ERROR: Cannot initialize runtime pool.
                ...

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuronsdk --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_object_detection_yolov5` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-object-detection_yolov5-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_object_detection_yolov5.py`` when ``--cam uvc`` and ``--engine neuronsdk`` are used.
    The pipeline graph is generated using ``gst-report`` from ``gst-instruments``.
    For more details, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=320,height=320,format=RGB ! tensor_converter ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/yolov5s-int8.dla inputtype=uint8 input=3:320:320:1 outputtype=uint8 output=85:6300:1 ! \
        other/tensors,num_tensors=1,types=uint8,dimensions=85:6300:1:1,format=static ! \
        tensor_transform mode=arithmetic option=typecast:float32,add:-4.0,mul:0.0051498096 ! \
        tensor_decoder mode=bounding_boxes option1=yolov5 option2=/usr/bin/nnstreamer-demo/coco.txt option3=0 option4=640:480 option5=320:320 ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection_yolov5.svg
        :width: 1000


Pose Estimation
^^^^^^^^^^^^^^^

.. image:: /_asset/tools_nnstreamer_examples_pose_estimation.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation.py``
- Model: `posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/pose_estimation>`_
- Run example:

    1. Set the variable ``APP`` to the Pose Estimation application:

        .. prompt:: bash # auto

            # APP=pose_estimation

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuronsdk --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_pose_estimation` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-pose-estimation-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_pose_estimation.py`` with ``--cam uvc`` and ``--engine neuronsdk``.
    The pipeline graph is generated using ``gst-report`` from ``gst-instruments``.
    For details, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=257,height=257,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.dla inputtype=float32 input=3:257:257:1 outputtype=float32,float32,float32,float32 output=17:9:9:1,34:9:9:1,32:9:9:1,32:9:9:1 ! queue ! \
        tensor_decoder mode=pose_estimation option1=640:480 option2=257:257 option3=/usr/bin/nnstreamer-demo/point_labels.txt option4=heatmap-offset ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_pose_estimation.svg
        :width: 1000


Face Detection
^^^^^^^^^^^^^^

.. image:: /_asset/tools_nnstreamer_examples_face_detection.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_face_detection.py``
- Model: `detect_face.tflite <http://ci.nnstreamer.ai/warehouse/nnmodels/>`_
- Run example:

    1. Set the variable ``APP`` to the Face Detection application:

        .. prompt:: bash # auto

            # APP=face_detection

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuronsdk --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_face_detection` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-face-detection-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_face_detection.py`` with ``--cam uvc`` and ``--engine neuronsdk``.
    The pipeline graph is generated using ``gst-report`` from ``gst-instruments``.
    For details, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! videoconvert ! cairooverlay name=tensor_res ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/detect_face.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,2:1917:1 ! \
        tensor_sink name=res_face

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_face_detection.svg
        :width: 1000


Monocular Depth Estimation
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. image:: /_asset/tools_nnstreamer_examples_monocular_depth_estimation.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_monocular_depth_estimation.py``
- Model: `midas.tflite <https://github.com/isl-org/MiDaS/releases/tag/v2_1>`_
- Run example:

    1. Set the variable ``APP`` to the Monocular Depth Estimation application:

        .. prompt:: bash # auto

            # APP=monocular_depth_estimation

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
                    --engine neuronsdk --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_monocular_depth_estimation` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-monocular-depth-estimation-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_monocular_depth_estimation.py`` when ``--cam uvc`` and ``--engine neuronsdk`` are used.
    The pipeline graph is generated using ``gst-report`` from ``gst-instruments``.
    For more information, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 ! video/x-raw,format=YUY2,width=640,height=480 num-buffers=300 ! videoconvert ! videoscale ! \
        video/x-raw,format=RGB,width=256,height=256 ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
        tensor_filter latency=1 framework=neuronsdk throughput=0 model=/usr/bin/nnstreamer-demo/midas.dla inputtype=float32 input=3:256:256:1 outputtype=float32 output=1:256:256:1 ! \
        appsink name=sink emit-signals=True max-buffers=1 drop=True sync=False

    .. image:: /_asset/tools_nnstreamer_examples_monocular_depth_estimation.svg
        :width: 1000


Image-Input Application
-----------------------

A Portable Network Graphics (PNG) file is required as the input source for the following demonstrations.

General Configuration
^^^^^^^^^^^^^^^^^^^^^

The image-based examples share a common configuration pattern.
Developers can switch the application while keeping the base configuration unchanged.

The following settings enable Performance Mode and configure the input image:

.. prompt:: bash # auto

    # IMAGE_PATH=/usr.bin/nnstreamer-demo/original.png
    # IMAGE_WIDTH=600
    # IMAGE_HEIGHT=400
    # MODE=1


Low Light Image Enhancement
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. image:: /_asset/tools_nnstreamer_examples_low_light_image_enhancement.svg
    :width: 800

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py``
- Model: `lite-model_zero-dce_1.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/zero_dce_tflite>`_
- Run example:

    The example image (``/usr/bin/nnstreamer-demo/original.png``) is downloaded from `paperswithcode (LOL dataset) <https://paperswithcode.com/dataset/lol>`_.

    1. Set the variable ``APP`` to the Low Light Image Enhancement application:

        .. prompt:: bash # auto

            # APP=low_light_image_enhancement

    2. Choose the runtime engine:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # ENGINE=neuron_stable

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # ENGINE=neuronsdk

        - **CPU-only inference**

            .. prompt:: bash # auto

                # unset ENGINE    # or ENGINE=cpu

    3. Run the command:

        - **Online inference with Neuron Stable Delegate**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --img $IMAGE_PATH --width $IMAGE_WIDTH --height $IMAGE_HEIGHT \
                    --engine neuron_stable --performance $MODE

        - **Offline inference with Neuron SDK**

            .. prompt:: bash # auto

                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
                    --app $APP --img $IMAGE_PATH --width $IMAGE_WIDTH --height $IMAGE_HEIGHT \
                    --engine neuronsdk --performance $MODE

        The enhanced image is saved under ``/usr/bin/nnstreamer-demo`` and named as ``low_light_enhancement_${ENGINE}.png``.
        Developers can also use the ``--export`` option in the script to customize the output filename.

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_low_light_image_enhancement`
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-low-light-image-enhancement-latest-v24_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph

    The following GStreamer pipeline is defined in ``nnstreamer_example_low_light_image_enhancement.py`` when ``--engine neuronsdk`` is used.
    The pipeline graph is generated using ``gst-report`` from ``gst-instruments``.
    For more details, see :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        filesrc location=/usr/bin/nnstreamer-demo/original.png ! pngdec ! videoscale ! videoconvert ! video/x-raw,width=600,height=400,format=RGB ! \
        tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:0,div:255.0 ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/lite-model_zero-dce_1.dla inputtype=float32 input=3:600:400:1 outputtype=float32 output=3:600:400:1 ! \
        tensor_sink name=tensor_sink

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_low_light_image_enhacement.svg
        :width: 1000


Performance
===========

Inference Time – ``tensor_filter`` Invoke Time
----------------------------------------------

The inference time for each example is measured using the ``latency`` property of ``tensor_filter``.
The property is defined in the `tensor_filter_common.c source code <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L982>`_:

.. prompt:: c # auto

    Turn on performance profiling for the average latency over the recent 10 inferences in microseconds.
    Currently, this accepts either 0 (OFF) or 1 (ON). By default, it's set to 0 (OFF).

To enable ``latency`` profiling, modify each Python example and add ``latency=1`` to the ``tensor_filter`` properties.

The following example uses ``nnstreamer_example_image_classification.py``:

1. Edit the script ``nnstreamer_example_image_classification.py``.

2. Locate ``tensor_filter`` and add ``latency=1``:

    .. prompt:: text # auto

        if engine == 'neuronsdk':
            tensor = dla_converter(self.tflite_model, self.dla)
            cmd += f'tensor_filter latency=1 framework=neuronsdk model={self.dla} {tensor} ! '
        elif engine == 'neuron_stable':
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:Stable,StaDelegateSettingFile:/usr/share/label_image/stable_delegate_settings.json ! '
        else:
            # CPU-only fallback
            cpu_cores = find_cpu_cores()
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=NumThreads:{cpu_cores} ! '

3. Save the script.

4. Set the glib log level to ``all`` to print the timing messages:

    .. prompt:: bash # auto

        export G_MESSAGES_DEBUG=all

5. Run the example.
   The log output includes entries similar to ``Invoke took 2.537 ms``, which indicate the measured inference time.

    .. prompt:: bash # auto

        # CAM_TYPE=uvc
        # CAMERA_NODE_ID=130
        # MODE=1
        # ENGINE=neuronsdk

        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
            --app image_classification --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
            --engine $ENGINE --performance $MODE
        ...
        ** INFO: 03:16:01.589: [/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla] Invoke took 2.537 ms
        ...


NNStreamer Advanced Pipeline Examples
=====================================

.. _pipeline_profiling:

Pipeline Profiling
------------------

|IOT-YOCTO| includes `gst-instruments <https://github.com/kirushyk/gst-instruments>`_ as a profiling tool for performance analysis and data flow inspection of GStreamer pipelines.

The two main utilities are:

- ``gst-top-1.0``:

    Shows a performance report for each element in a pipeline.

    .. prompt:: bash # auto

        # gst-top-1.0 \
          gst-launch-1.0 \
          v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw t_raw. ! queue leaky=2 max-size-buffers=10 ! \
        ...

        Got EOS from element "pipeline0".
        Execution ended after 0:00:10.221403924
        Setting pipeline to NULL ...
        Freeing pipeline ...
        ELEMENT                    %CPU   %TIME   TIME
        videoconvert0               13.8   55.3    1.41 s
        videoscale0                  3.7   14.9    379 ms
        tensortransform0             2.2    9.0    228 ms
        fps-display-text-overlay     2.0    8.1    207 ms
        tensordecoder0               0.7    2.8   71.9 ms
        tensorfilter0                0.6    2.3   59.5 ms
        ...

    The tool also saves the statistics to a GstTrace file named ``gst-top.gsttrace``:

    .. prompt:: bash # auto

        # ls -al *.gsttrace
        -rw-r--r-- 1 root root 11653120 Jan  4 05:23 gst-top.gsttrace

- ``gst-report``:

    Converts a GstTrace file into a performance graph in DOT format:

    .. prompt:: bash # auto

        # gst-report-1.0 --dot gst-top.gsttrace | dot -Tsvg > perf.svg

    The figure below shows the performance graph for ``nnstreamer_example_object_detection.py``.
    It displays CPU usage, time usage, and execution time for each element.
    This makes it easy to identify the elements that consume most of the CPU or execution time.

    In this example, ``tensor_transform`` consumes 56.9% of the total execution time because it performs buffer data conversion on the CPU.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection.svg
        :width: 1000

.. note::

    For more information, refer to `NNStreamer online documentation: Profiling <https://nnstreamer.github.io/tools/profiling/README.html>`_.