.. include:: /keyword.rst

==========
NNStreamer
==========

.. contents:: Sections
    :local:
    :depth: 3

Overview
--------

`NNStreamer <https://nnstreamer.ai/>`_ is a set of `Gstreamer plugins <https://nnstreamer.github.io/component-description.html#gstreamer-elements-plugins>`_ that allow 
Gstreamer developers to adopt neural network models, and neural network developers to manage neural network pipelines with their filters in a easy and efficient way.

NNStreamer provides the `new Gstreamer stream data type and a set of Gstreamer elements (plugins) <https://nnstreamer.github.io/component-description.html>`_ to construct media 
stream pipeline with neural network models. It is well documented through its `online document site <https://nnstreamer.github.io/index.html>`_ and it supports well-known neural 
network frameworks including Tensorflow, Tensorflow-lite, Caffe2, PyTorch, OpenVINO and ARMNN. 

Users may include custom C functions, C++ objects, or Python objects as well as such frameworks as neural network filters of a pipeline in run-time and also add and integrate 
support for such frameworks or hardware AI accelerators in run-time, which may exist as independent plugin binaries.


NNStreamer::tensor_filter
-------------------------

`tensor_filter <https://nnstreamer.github.io/gst/nnstreamer/tensor_filter/README.html>`_ is the main element of the whole NNStreamer project. 
This connects gstreamer data stream with neural network frameworks such as 
`Tensorflow-lite <https://github.com/nnstreamer/nnstreamer/blob/main/ext/nnstreamer/tensor_filter/tensor_filter_tensorflow_lite.cc>`_.
Like a typical gstreamer plugin, you can use ``gst-inspect-1.0`` to view all plugin information of the ``tensor_filter``:

.. prompt:: bash # auto

    # gst-inpsect-1.0 tensor_filter
    ...
            Pad Templates:
        SINK template: 'sink'
            Availability: Always
            Capabilities:
            other/tensor
                    framerate: [ 0/1, 2147483647/1 ]
            other/tensors
                        format: { (string)static, (string)flexible }
                    framerate: [ 0/1, 2147483647/1 ]

        SRC template: 'src'
            Availability: Always
            Capabilities:
            other/tensor
                    framerate: [ 0/1, 2147483647/1 ]
            other/tensors
                        format: { (string)static, (string)flexible }
                    framerate: [ 0/1, 2147483647/1 ]

        Element has no clocking capabilities.
        Element has no URI handling capabilities.

        Pads:
        SINK: 'sink'
            Pad Template: 'sink'
        SRC: 'src'
            Pad Template: 'src'

        Element Properties:
        accelerator         : Set accelerator for the subplugin with format (true/false):(comma separated ACCELERATOR(s)). true/false determines if accelerator is to be used. list of accelerators determines the backend (ignored with false). Example, if GPU, NPU can be used but not CPU - true:npu,gpu,!cpu. The full list of accelerators can be found in nnstreamer_plugin_api_filter.h. Note that only a few subplugins support this property.
                                flags: readable, writable
                                String. Default: ""
        custom              : Custom properties for subplugins ?
                                flags: readable, writable
                                String. Default: ""
        framework           : Neural network framework
                                flags: readable, writable
                                String. Default: "auto"
        input               : Input tensor dimension from inner array, up to 4 dimensions ?
                                flags: readable, writable
                                String. Default: ""
        input-combination   : Select the input tensor(s) to invoke the models
                                flags: readable, writable
                                String. Default: ""
        inputlayout         : Set channel first (NCHW) or channel last layout (NHWC) or None for input data. Layout of the data can be any or NHWC or NCHW or none for now.
                                flags: readable, writable
                                String. Default: ""
        inputname           : The Name of Input Tensor
                                flags: readable, writable
                                String. Default: ""
        inputranks          : The Rank of the Input Tensor, which is separated with ',' in case of multiple Tensors
                                flags: readable
                                String. Default: ""
        inputtype           : Type of each element of the input tensor ?

    ...

On |IOT-YOCTO|, Genio platforms provide different machine learning software stacks for the developer:

.. csv-table:: Table 2. Software Stack on Board
    :class: longtable
    :file: /_asset/tables/ml-platform-sw-stack.csv
    :width: 65%
    :widths: 140 100 100 100 100 100

- For the Tensorflow-Lite framework: 
  
    Users can directly construct gstreamer media stream pipeline using the existing `tensor_filter_tensorflow_lite <https://github.com/nnstreamer/nnstreamer/blob/main/ext/nnstreamer/tensor_filter/tensor_filter_tensorflow_lite.cc>`_. 
    You can find many examples of using the Tensorflow-Lite framework in `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.
    
    When using ``tensor_filter_tensorflow_lite``, you should specify neural network framework, model path.
    For the model meta information, you do not need to specify the properties, such as **in/out type and dimension** because these properties can automatically get from the tensorflow-lite model in ``tensor_filter_tensorflow_lite``.

    Here is an example of the launch line using the Tensorflow-Lite framework. More launch line examples here: `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.

    .. prompt:: bash # auto

         ... tensor_converter ! \
            tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8 ! \
            ...

 :ref:`Neuron SDK <Neuron SDK>`:
  
    |IOT-YOCTO| have provided a new ``tensor_filter`` for Neuron SDK. Users can use ``tensor_filter_neuronsdk`` to create gstreamer media stream pipeline and leverage Genio platform's powerful AI hardware accelerator, 
    such as MDLA. You can find the implementation of the ``tensor_filter_neuronsdk`` in |IOT-YOCTO| NNStreamer source (``$BUILD_DIR/tmp/work/armv8a-poky-linux/nnstreamer/$PV/git/ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc``).

.. _tensor_filter_neuronsdk:

    When using ``tensor_filter_neuronsdk``, you should specify neural network framework, model path.
    For the model meta information, you have to specify the properties, such as **in/out type and dimension** because these properties can not get from the dla file in ``tensor_filter_neuronsdk``, 
    dla file does not provide the interfaces to obtain this information.

    Here is an example of the launch line using the Neuron SDK:

    .. prompt:: bash # auto

        ...  tensor_converter ! \
            tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
            ...

    .. note::

        The ``tensor_filter`` properties related to in/out type and dimension are as follows:

        - `inputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_: Type of each element of the input tensor.

        - `inputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L938>`_: Set channel first (NCHW) or channel last layout (NHWC) or None for input data.

        - `input <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L926>`_: Input tensor dimension from inner array, up to 4 dimensions.

        - `outputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L955>`_: Type of each element of the output tensor.

        - `outputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L959>`_: Set channel first (NCHW) or channel last layout (NHWC) or None for output data.

        - `output <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L951>`_: Output tensor dimension from inner array, up to 4 dimensions.


        You can also find more detailed descriptions of tensor_filter from the `NNstreamer online document <https://nnstreamer.github.io/gst/nnstreamer/tensor_filter/README.html>`_ and the 
        `source code <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_.


NNStreamer Unit Test
--------------------

NNStreamer provides `gtest based test cases for common library and nnstreamer plugins <https://github.com/nnstreamer/nnstreamer/tree/main/tests>`_. 
You can run the unit tests using the following command to get insights into the integration status of nnstreamer on Yocto.

.. prompt:: bash # auto

    # cd /usr/bin/unittest-nnstreamer/
    # ssat
    ...
    ==================================================

    [PASSED] transform_typecast (37 passed among 39 cases)
    [PASSED] nnstreamer_filter_neuronsdk (8 passed among 8 cases)
    [PASSED] transform_dimchg (13 passed among 13 cases)
    [PASSED] nnstreamer_decoder_pose (3 passed among 3 cases)
    [PASSED] nnstreamer_decoder_boundingbox (15 passed among 15 cases)
    [PASSED] transform_clamp (10 passed among 10 cases)
    [PASSED] transform_stand (9 passed among 9 cases)
    [PASSED] transform_arithmetic (36 passed among 36 cases)
    [PASSED] nnstreamer_decoder (17 passed among 17 cases)
    [PASSED] nnstreamer_filter_custom (23 passed among 23 cases)
    [PASSED] transform_transpose (16 passed among 16 cases)
    [PASSED] nnstreamer_filter_tensorflow2_lite (31 passed among 31 cases)
    [PASSED] nnstreamer_repo_rnn (2 passed among 2 cases)
    [PASSED] nnstreamer_converter (32 passed among 32 cases)
    [PASSED] nnstreamer_repo_dynamicity (10 passed among 10 cases)
    [PASSED] nnstreamer_mux (84 passed among 84 cases)
    [PASSED] nnstreamer_split (21 passed among 21 cases)
    [PASSED] nnstreamer_repo (77 passed among 77 cases)
    [PASSED] nnstreamer_demux (43 passed among 43 cases)
    [PASSED] nnstreamer_filter_python3 (0 passed among 0 cases)
    [PASSED] nnstreamer_rate (17 passed among 17 cases)
    [PASSED] nnstreamer_repo_lstm (2 passed among 2 cases)
    ==================================================
    [PASSED] All Test Groups (23) Passed!
            TC Passed: 595 / Failed: 0 / Ignored: 2

Some test cases are not invoked via command: ``ssat`` because they lack the implementation of ``runTest.sh``, with ArmNN unit tests being one such example. 
However, you can confirm the integration status of ArmNN with NNStreamer by directly running ``/usr/bin/unittest-nnstreamer/tests/unittest_filter_armnn``.

.. prompt:: bash # auto

    # cd /usr/bin/unittest-nnstreamer/tests/
    # export NNSTREAMER_SOURCE_ROOT_PATH=/usr/bin/unittest-nnstreamer/
    # ./unittest_filter_armnn
    ...
    [==========] 13 tests from 1 test suite ran. (141 ms total)
    [  PASSED  ] 13 tests.


NNStreamer Pipeline Examples
----------------------------

|IOT-YOCTO| provides follownig examples in python in ``/usr/bin/nnstreamer-demo/`` to demonstrate how to create a NNStreamer pipeline with different ``tensor_filters`` for different use cases and implementation options.
Those examples are adapted from `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.

.. csv-table:: Table Features of NNStreamer Examples
    :class: longtable
    :file: /_asset/tables/ml-nnstreamer-demo.csv
    :width: 65%
    :widths: 500 100

To run these examples, you will need a v4l2-cmpatible device. You can use a USB webcam as a v4l2 video device and operate through GStreamer.
To find out the USB camera, you can refer to command in :ref:`USB Camera <usb_camera>`, such as

.. prompt:: bash # auto

    # ls -l /sys/class/video4linux
    ...
    lrwxrwxrwx 1 root root 0 Oct  8 01:29 video5 -> ../../devices/platform/soc/11201000.usb/11200000.xhci/usb1/1-1/1-1.3/1-1.3:1.0/video4linux/video5
    ...

From the above command, we can find that ``/dev/video5`` is the camera node.

For each example apps, you can check the implementation in the scripts listed on above Table.
The remaining part of this section, we are going to use ``run_nnstreamer_example.py`` to go through the demo process.
you can use ``--help`` to find all options of it. 

.. prompt:: bash # auto

    # python3 run_nnstreamer_example.py --help
    usage: run_nnstreamer_example.py [-h] [--app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement}]
                                 [--engine {neuronsdk,tflite,armnn}] [--img IMG] [--cam CAM] --cam_type {uvc,yuvsensor,rawsensor} [--width WIDTH] [--height HEIGHT] [--performance {0,1}]
                                 [--fullscreen {0,1}] [--throughput {0,1}] [--rot ROT]

    options:
    -h, --help            show this help message and exit
    --app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement}
                        Choose a demo app to run. Default: image_classification
    --engine {neuronsdk,tflite,armnn}
                        Choose a backends to inference. Default: neuronsdk
    --img IMG           Input a image file path.
                        Example: /usr/bin/nnstreamer-demo/original.png
                        Note: This paramater is dedicated to low light enhancement app
    --cam CAM           Input a camera node id, ex: 130 .
                        Use 'v4l2-ctl --list-devices' query camera node id.
                        Example:
                        $ v4l2-ctl --list-devices
                            ...
                            C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
                            /dev/video130
                            /dev/video131
                            ...
                        Note: This paramater is for all the apps except low light enhancement app.
    --cam_type {uvc,yuvsensor,rawsensor}
                        Choose correct type of camera being used for the demo, ex: yuvsensor
                        Note: This paramater is for all the apps except low light enhancement app.
    --width WIDTH       Input video display width, ex: 640
    --height HEIGHT     Input video display height, ex: 480
    --performance {0,1} Enable to make CPU/GPU/APU run under performance mode, ex: 1
    --fullscreen {0,1}  Fullscreen preview.
                        1: Enable
                        0: Disable
                        Note: This paramater is for all the apps except low light enhancement app.
    --throughput {0,1}  Print throughput information.
                        1: Enable
                        0: Disable
    --rot ROT           Rotate the camera image by degrees, ex: 90
                        Note: This paramater is for all the apps except low light enhancement app.


Below are the main options:

- ``--engine``:

    Choose one from the backends supported by the platform to use. It could be ``neuronsdk``, ``tflite``, ``armnn`` or ``nnapi``.

                You can find the function: ``build_pipeline`` in python script. This function will create the ``tensor_filter`` with different framework and properties based on the backend you choose.
                
                Take ``run_nnstreamer_example.py`` as an example:
                
                    - ``--engine tflite`` :
                    
                        .. prompt:: text # auto
                            
                            tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8

                    - ``--engine armnn`` :
                    
                        .. prompt:: text # auto
                            
                             tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:External,ExtDelegateLib:/usr/lib/libarmnnDelegate.so.28.0,ExtDelegateKeyVal:backends#GpuAcc 
                    
                    - ``--engine neuronsdk`` :

                        .. prompt:: text # auto
                            
                             tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1

                        :ref:`As mentioned earlier <tensor_filter_neuronsdk>`, unlike using Tensorflow-Lite framework, when using neuronsdk, for the model meta information, you have to specify the properties, 
                        such as **in/out type and dimension** because these properties can not get from the dla file in ``tensor_filter_neuronsdk``, dla file does not provide the interfaces to obtain this information.
                        
                        You can refer to the ``build_pipeline`` in python script to know how we set these properties:

                        - `inputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_: Type of each element of the input tensor.

                        - `inputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L938>`_: Set channel first (NCHW) or channel last layout (NHWC) or None for input data.

                        - `input <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L926>`_: Input tensor dimension from inner array, up to 4 dimensions.

                        - `outputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L955>`_: Type of each element of the output tensor.

                        - `outputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L959>`_: Set channel first (NCHW) or channel last layout (NHWC) or None for output data.

                        - `output <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L951>`_: Output tensor dimension from inner array, up to 4 dimensions.

                    - ``--engine nnapi`` :
                    
                        .. prompt:: text # auto
                            
                             tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:External,ExtDelegateLib:/usr/lib/nnapi_external_delegate.so 

                        .. note:: 
                            ``--engine nnapi`` is only available on Genio-350.
                    
                    
- ``--cam``:  Input a camera node **id**.
- ``--performance``:

    Set performance mode for your platform. Select your current platform and set the performance mode for it. It could be 

    - ``--performance 0`` : Set the performance mode off
    - ``--performance 1`` : Set the performance mode on

    Performance mode will make the CPU, GPU, and APU running at the highest frequency and disable thermal throttling.

.. note::

    For Image Classification, Object Detection, Pose Estimation, Face Detection, we all use the camera node: ``/dev/video5`` and run Performance Mode on Genio-700 platform as an example. 
    So, the options we use to run example are ``--cam 5 --performance 1``.


    Before running example, set global variables for camera node

    .. prompt:: bash # auto
        
        # CAM_TYPE=uvc
        # CAMERA_NODE_ID=5
        # MODE=1


Image Classification
====================

.. image:: /_asset/tools_nnstreamer_examples_image_classification.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_image_classification.py``
- Model: `mobilenet_v1_1.0_224_quant.tflite <https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_1.0_224_quant_and_labels.zip>`_
- Run example:

    Before running example, set global variable for Image Classification application:

    .. prompt:: bash # auto
        
        # APP=image_classification

    - **Execute on MDLA by neuronsdk**:

    .. prompt:: bash # auto
        
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

    - **Execute on CPU**: 

    .. prompt:: bash # auto
    
        # ENGINE=tflite
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE
        
    - **Execute on GPU by ArmNN delegate**: 

    .. prompt:: bash # auto
    
        # ENGINE=armnn
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

    - **Execute on VPU by nnapi**:

    .. note::
        ``--engine nnapi`` is only available on Genio-350.

    .. prompt:: bash # auto
    
        # ENGINE=nnapi
        # python3 /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification.py --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of nnstreamer_example_image_classification(UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-image-classification-latest-v23_1_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph:

    Below is the gstreamer command and pipeline graph constructed in the example: ``nnstreamer_example_image_classification.py`` using ``--engine neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command of ``gst-instruments``. Detailed command can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue ! textoverlay name=tensor_res font-desc=Sans,24 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=224,height=224,format=RGB ! tensor_converter ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
        tensor_sink name=tensor_sink

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_image_classification.svg
        :width: 1000


Object Detection
================

ssd_mobilenet_v2_coco
~~~~~~~~~~~~~~~~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_object_detection.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_object_detection.py``
- Model: `ssd_mobilenet_v2_coco.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/ssd_mobilenet_v2_coco>`_
- Run example:

    Before running example, set global variable for Object Detection application:

    .. prompt:: bash # auto
        
        # APP=object_detection

    - **Execute on MDLA by neuronsdk**:

    .. prompt:: bash # auto
        
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

    - **Execute on CPU**: 

    .. prompt:: bash # auto
    
        # ENGINE=tflite
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE
        
    - **Execute on GPU by ArmNN delegate**: 

    .. prompt:: bash # auto
    
        # ENGINE=armnn
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of nnstreamer_example_object_detection(UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-object-detection-latest-v23_1_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph:

    Below is the gstreamer command and pipeline graph constructed in the example: ``nnstreamer_example_object_detection.py`` using ``--engine neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command of ``gst-instruments``. Detailed command can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! v4l2convert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/ssd_mobilenet_v2_coco.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,91:1917:1 ! \
        tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/usr/bin/nnstreamer-demo/coco_labels_list.txt option3=/usr/bin/nnstreamer-demo/box_priors.txt option4=640:480 option5=300:300 ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection.svg
        :width: 1000

yolov5
~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_object_detection_yolov5.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5.py``
- Model: `yolov5s-int8.tflite <https://github.com/ultralytics/yolov5>`_
- Run example:

    Before running example, set global variable for Object Detection Yolov5 application:

    .. prompt:: bash # auto
        
        # APP=object_detection_yolov5

    - **Execute on MDLA by neuronsdk**:

    .. prompt:: bash # auto
        
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

    .. note::
        
        The yolov5 model is only supported by Genio-700, MDLA3.0 , not supported by Genio-1200, MDLA2.0.

        On Genio-1200, the model cannot be compiled into dla file by ncc-tflite due to unsupported operation.

        .. prompt:: bash # auto

            # ncc-tflite --arch mdla2.0 yolov5s-int8.tflite -o yolov5s-int8.dla --int8-to-uint8
            OP[123]: RESIZE_NEAREST_NEIGHBOR
            ├ MDLA: HalfPixelCenters is unsupported.
            ├ EDMA: unsupported operation
            OP[145]: RESIZE_NEAREST_NEIGHBOR
            ├ MDLA: HalfPixelCenters is unsupported.
            ├ EDMA: unsupported operation
            ERROR: Cannot find an execution plan because of unsupported operations
            ERROR: Fail to compile yolov5s-int8.tflite

        So you will fail to run ``nnstreamer-demo/run_nnstreamer_example.py --app object_detection_yolov5`` on Genio-1200.

        .. prompt:: bash # auto

            # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app object_detection_yolov5 --cam_type uvc --cam 5 --engine neuronsdk --performance 1
            ...
            ERROR: Cannot open the file: /usr/bin/nnstreamer-demo/yolov5s-int8.dla
            ERROR: Cannot set a nullptr compiled network.
            ERROR: Cannot set compiled network.
            ERROR: Runtime loadNetworkFromFile fails.
            ERROR: Cannot initialize runtime pool.
            ...


    - **Execute on CPU**: 

    .. prompt:: bash # auto
    
        # ENGINE=tflite
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE
    
    - **Execute on GPU by ArmNN delegate**: 

    .. prompt:: bash # auto
    
        # ENGINE=armnn
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of nnstreamer_example_object_detection_yolov5(UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-object-detection_yolov5-latest-v23_1_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph:

    Below is the gstreamer command and pipeline graph constructed in the example: ``nnstreamer_example_object_detection_yolov5.py`` using ``--engine neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command of ``gst-instruments``. Detailed command can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=320,height=320,format=RGB ! tensor_converter ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/yolov5s-int8.dla inputtype=uint8 input=3:320:320:1 outputtype=uint8 output=85:6300:1 ! \
        other/tensors,num_tensors=1,types=uint8,dimensions=85:6300:1:1,format=static ! \
        tensor_transform mode=arithmetic option=typecast:float32,add:-4.0,mul:0.0051498096 ! \
        tensor_decoder mode=bounding_boxes option1=yolov5 option2=/usr/bin/nnstreamer-demo/coco.txt option3=0 option4=640:480 option5=320:320 ! queue leaky=2 max-size-buffers=2 ! mix.
       
    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection_yolov5.svg
        :width: 1000

Pose Estimation
===============

.. image:: /_asset/tools_nnstreamer_examples_pose_estimation.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation.py``
- Model: `posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/pose_estimation>`_
- Run example:

    Before running example, set global variable for Pose Estimation application:

    .. prompt:: bash # auto
        
        # APP=pose_estimation

    - **Execute on MDLA by neuronsdk**:

    .. prompt:: bash # auto
        
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

    - **Execute on CPU**: 

    .. prompt:: bash # auto
    
        # ENGINE=tflite
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE
    
    - **Execute on GPU by ArmNN delegate**: 

    .. prompt:: bash # auto
    
        # ENGINE=armnn
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of nnstreamer_example_pose_estimation(UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-pose-estimation-latest-v23_1_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph:

    Below is the gstreamer command and pipeline graph constructed in the example: ``nnstreamer_example_pose_estimation.py`` using ``--engine neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command of ``gst-instruments``. Detailed command can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=257,height=257,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.dla inputtype=float32 input=3:257:257:1 outputtype=float32,float32,float32,float32 output=17:9:9:1,34:9:9:1,32:9:9:1,32:9:9:1 ! queue ! \
        tensor_decoder mode=pose_estimation option1=640:480 option2=257:257 option3=/usr/bin/nnstreamer-demo/point_labels.txt option4=heatmap-offset ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_pose_estimation.svg
        :width: 1000

Face Detection
==============

.. image:: /_asset/tools_nnstreamer_examples_face_detection.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_face_detection.py``
- Model: `detect_face.tflite <http://ci.nnstreamer.ai/warehouse/nnmodels/>`_ 
- Run example:

    Before running example, set global variable for Face Detection application:

    .. prompt:: bash # auto
        
        # APP=face_detection

    - **Execute on MDLA by neuronsdk**:

    .. prompt:: bash # auto
        
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

    - **Execute on CPU**: 

    .. prompt:: bash # auto
    
        # ENGINE=tflite
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE
    
    - **Execute on GPU by ArmNN delegate**: 

    .. prompt:: bash # auto
    
        # ENGINE=armnn
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of nnstreamer_example_face_detection(UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-face-detection-latest-v23_1_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph:

    Below is the gstreamer command and pipeline graph constructed in the example: ``nnstreamer_example_pose_estimation.py`` using ``--engine neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command of ``gst-instruments``. Detailed command can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! videoconvert ! cairooverlay name=tensor_res ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/detect_face.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,2:1917:1 ! \
        tensor_sink name=res_face

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_face_detection.svg
        :width: 1000


Low Light Image Enhancement
===========================

.. image:: /_asset/tools_nnstreamer_examples_low_light_image_enhancement.svg
    :width: 800

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py``
- Model: `lite-model_zero-dce_1.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/zero_dce_tflite>`_ 
- Run example:

    This example does not read frames from a USB camera but inputs a PNG image and its width and height through option: ``--img``, ``--width`` and ``--height``.
    We have prepared a low-light photo (``/usr/bin/nnstreamer-demo/original.png``) as an example, which was downloaded from this `link: <https://paperswithcode.com/dataset/lol>`_.

    The enhanced image will be stored in the path: ``/usr/bin/nnstreamer-demo`` and named as ``low_light_enhancement_${backend}.png``, you can also use the option: ``--export`` to name the enhanced image. 
    

    Before running example, set global variables for input image

    .. prompt:: bash # auto
        
        # IMAGE=/usr/bin/nnstreamer-demo/original.png
        # IMAGE_WIDTH=600
        # IMAGE_HEIGHT=400
        # APP=low_light_image_enhancement
        # MODE=1

    - **Execute on MDLA by neuronsdk**:

    .. prompt:: bash # auto
    
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --img $IMAGE --engine $ENGINE --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $MODE
    
    - **Execute on CPU**: 

    .. prompt:: bash # auto

        # ENGINE=tflite
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --img $IMAGE --engine $ENGINE --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $MODE
    
    - **Execute on GPU by ArmNN delegate**: 

    .. prompt:: bash # auto

        # ENGINE=armnn
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --img $IMAGE --engine $ENGINE --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $MODE


    .. note::
    
        You will fail to run ``nnstreamer-demo/run_nnstreamer_example.py --app low_light_image_enhancement`` with ``--engine armnn`` because operator ``SQUARE`` is not supported by Arm NN.

        .. prompt:: bash # auto

            # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app low_light_image_enhancement --img /usr/bin/nnstreamer-demo/original.png --engine armnn --width 600 --height 400 --performance $MODE
            ...
            INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
            ERROR: Operator SQUARE [92] is not supported by armnn_delegate.
            ...

- Average inference time

    .. csv-table:: Average inference time of nnstreamer_example_low_light_image_enhancement
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-low-light-image-enhancement-latest-v23_1_0.csv
        :width: 65%
        :widths: 200 150 150 150 150

- Pipeline graph:

    Below is the gstreamer command and pipeline graph constructed in the example: ``--app pose_estimation`` and ``--engine neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command of ``gst-instruments``. Detailed command can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        filesrc location=/usr/bin/nnstreamer-demo/original.png ! pngdec ! videoscale ! videoconvert ! video/x-raw,width=600,height=400,format=RGB ! \
        tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:0,div:255.0 ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/lite-model_zero-dce_1.dla inputtype=float32 input=3:600:400:1 outputtype=float32 output=3:600:400:1 ! \
        tensor_sink name=tensor_sink

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_low_light_image_enhacement.svg
        :width: 1000


Performance
-----------

NNStreamer::tensor_filter Invoke Time
=====================================

By default, the NNStreamer does not show ``tensor_filter`` invoke time (inference time) on the screen, but we can find this information by enabling ``tensor_filter`` property: ``latency``.

According to the source code of ``tensor_filter``, the definition of the property: `latency <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L991>`_ is

.. prompt:: c # auto

    Turn on performance profiling for the average latency over the recent 10 inferences in microseconds.
    Currently, this accepts either 0 (OFF) or 1 (ON).


For enabling ``latency``, currently, we have to modify the python script directly to add the property: ``latency=1`` to the ``tensor_filter``.
Take ``nnstreamer_example_image_classification.py`` as example: 

- Step.1: Open python script: ``nnstreamer_example_image_classification.py``
- Step.2: Search for ``tensor_filter`` and add ``latency=1`` after it.

    .. prompt:: text # auto

        if engine == 'neuronsdk':
            tensor = dla_converter(self.tflite_model, self.dla)
            cmd += f'tensor_filter framework=neuronsdk model={self.dla} {tensor} ! '
        elif engine == 'tflite':
            cpu_cores = find_cpu_cores()
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=NumThreads:{cpu_cores} ! '
        elif engine == 'armnn':
            library = find_armnn_delegate_library()
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:External,ExtDelegateLib:{library},ExtDelegateKeyVal:backends#GpuAcc ! '
        elif engine == 'nnapi':
            library = find_armnn_delegate_library()
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:External,ExtDelegateLib:/usr/lib/nnapi_external_delegate.so ! '


- Step.3: Then save python script.
- Step.4: Enable glib log by set global variable:

    .. prompt:: bash # auto
    
        export G_MESSAGES_DEBUG=all


- Step.4: Run the example, then you can find the following log: ``Invoke took 2.537 ms``, which is the inference time.

    .. prompt:: bash # auto

        # CAMERA_NODE_ID=5
        # MODE=1
        # ENGINE=neuronsdk
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app image_classification --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --engine $ENGINE --performance $MODE
        ...
        ...

        ** INFO: 03:16:01.589: [/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla] Invoke took 2.537 ms
        ...
        ...


.. _pipeline_profiling:

Pipeline Profiling
==================

In the `NNstreamer online document: Profiling <https://nnstreamer.github.io/tools/profiling/README.html>`_, nnstreamer recommends users to use `NNShark <https://github.com/nnstreamer/nnshark>`_ or 
`gst-instrument <https://github.com/kirushyk/gst-instruments>`_ for performance analysis of the pipeline.
For now, ``NNShark`` is not available on IoT Yocto, but ``gst-instrument`` is already included in the IoT Yocto rity-demo-image.

``gst-instrument`` is set of performance profiling and data flow inspection tools for GStreamer pipelines. It provides:

- ``gst-top-1.0``: 

    Displays performance report for each element in piepline.

    .. prompt:: bash # auto
    
        # gst-top-1.0 \
          gst-launch-1.0 \
          v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw t_raw. ! queue leaky=2 max-size-buffers=10 ! \
        ...

        Got EOS from element "pipeline0".
        Execution ended after 0:00:10.221403924
        Setting pipeline to NULL ...
        Freeing pipeline ...
        ELEMENT                    %CPU   %TIME   TIME
        videoconvert0               13.8   55.3    1.41 s
        videoscale0                  3.7   14.9    379 ms
        tensortransform0             2.2    9.0    228 ms
        fps-display-text-overlay     2.0    8.1    207 ms
        tensordecoder0               0.7    2.8   71.9 ms
        tensorfilter0                0.6    2.3   59.5 ms
        ...

    And save performance data as a file called ``gst-top.gsttrace``

    .. prompt:: bash # auto

        # ls -al *.gsttrace
        -rw-r--r-- 1 root root 11653120 Jan  4 05:23 gst-top.gsttrace

- ``gst-report``: 

    Generate performance graph in DOT format:

    .. prompt:: bash # auto

        # gst-report-1.0 --dot gst-top.gsttrace | dot -Tsvg > perf.svg


    Below is the performance graph of ``nnstreamer_example_object_detection.py``. It shows CPU usage, time usage, and execution time among the elements. 
    We can easily find who spends CPU resource mostly, who spends more time to execution.

    For example, as shown in the following figure, ``tensor_transform`` consumed 56.9% of the total execution time because ``tensor_transform`` processes the conversion of buffer data using the CPU.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection.svg
        :width: 1000