.. include:: /keyword.rst

==========
NNStreamer
==========

.. contents:: Sections
    :local:
    :depth: 3

Overview
--------

`NNStreamer <https://nnstreamer.ai/>`_ is a set of `GStreamer plugins <https://nnstreamer.github.io/component-description.html#gstreamer-elements-plugins>`_ that allow 
GStreamer developers to adopt neural network models, and neural network developers to manage neural network pipelines with their filters in a easy and efficient way.

It provides the `new GStreamer stream data type and a set of GStreamer elements (plugins) <https://nnstreamer.github.io/component-description.html>`_ to construct media 
stream pipeline with neural network models. It supports various well-known neural network frameworks including Tensorflow, Tensorflow-lite, Caffe2, PyTorch, OpenVINO and ArmNN.
All the details are well-documented on the `NNStreamer Official Documentation <https://nnstreamer.github.io/index.html>`_.


Users may include custom C functions, C++ objects, or Python objects as well as such frameworks as neural network filters of a pipeline in run-time and also add and integrate 
support for such frameworks or hardware AI accelerators in run-time, which may exist as independent plugin binaries.

.. important::
	
	|IOT-YOCTO| only have demo examples for Tensorflow-lite backend and MTK Neuron backend due to the machine learning software stack limitation.
	
Here comes the illustration of software stack for the NNStreamer on |IOT-YOCTO|.

.. image:: /_asset/tools_nnstreamer_software-stack.png
    :width: 1000


NNStreamer on IOT Yocto
-----------------------

The software stack for machine learning on |IOT-YOCTO| provides various backend-accelerator approach for the developer.
User will be able to run the inference with online Neuron Stable Delegate on MTK's powerful APU.

.. csv-table:: Table 2. Software Stack on |IOT-YOCTO|
    :class: longtable
    :file: /_asset/tables/ml-platform-sw-stack.csv
    :width: 65%
    :widths: 140 100 100 100 100 100

NNStreamer::tensor_filter
^^^^^^^^^^^^^^^^^^^^^^^^^

The NNStramer plugin - `tensor_filter <https://nnstreamer.github.io/gst/nnstreamer/tensor_filter/README.html>`_ plays a important role on the whole NNStreamer project. 
It acts as a bridge between GStreamer data stream and neural network frameworks such as 
`Tensorflow-lite <https://github.com/nnstreamer/nnstreamer/blob/main/ext/nnstreamer/tensor_filter/tensor_filter_tensorflow_lite.cc>`_,
which process the data stream to be the neural-network-accepted format and also run the model inference. 
Like a typical GStreamer plugin, the command ``gst-inspect-1.0`` will show the details of plugin information for ``tensor_filter``:


.. prompt:: bash # auto

    # gst-inpsect-1.0 tensor_filter
    ...
            Pad Templates:
        SINK template: 'sink'
            Availability: Always
            Capabilities:
            other/tensor
                    framerate: [ 0/1, 2147483647/1 ]
            other/tensors
                        format: { (string)static, (string)flexible }
                    framerate: [ 0/1, 2147483647/1 ]

        SRC template: 'src'
            Availability: Always
            Capabilities:
            other/tensor
                    framerate: [ 0/1, 2147483647/1 ]
            other/tensors
                        format: { (string)static, (string)flexible }
                    framerate: [ 0/1, 2147483647/1 ]

        Element has no clocking capabilities.
        Element has no URI handling capabilities.

        Pads:
        SINK: 'sink'
            Pad Template: 'sink'
        SRC: 'src'
            Pad Template: 'src'

        Element Properties:
        accelerator         : Set accelerator for the subplugin with format (true/false):(comma separated ACCELERATOR(s)). true/false determines if accelerator is to be used. list of accelerators determines the backend (ignored with false). Example, if GPU, NPU can be used but not CPU - true:npu,gpu,!cpu. The full list of accelerators can be found in nnstreamer_plugin_api_filter.h. Note that only a few subplugins support this property.
                                flags: readable, writable
                                String. Default: ""
        custom              : Custom properties for subplugins ?
                                flags: readable, writable
                                String. Default: ""
        framework           : Neural network framework
                                flags: readable, writable
                                String. Default: "auto"
        input               : Input tensor dimension from inner array, up to 4 dimensions ?
                                flags: readable, writable
                                String. Default: ""
        input-combination   : Select the input tensor(s) to invoke the models
                                flags: readable, writable
                                String. Default: ""
        inputlayout         : Set channel first (NCHW) or channel last layout (NHWC) or None for input data. Layout of the data can be any or NHWC or NCHW or none for now.
                                flags: readable, writable
                                String. Default: ""
        inputname           : The Name of Input Tensor
                                flags: readable, writable
                                String. Default: ""
        inputranks          : The Rank of the Input Tensor, which is separated with ',' in case of multiple Tensors
                                flags: readable
                                String. Default: ""
        inputtype           : Type of each element of the input tensor ?

    ...


Tensorflow-Lite Framework
^^^^^^^^^^^^^^^^^^^^^^^^^
  
Users can be able to construct GStreamer streaming pipeline by using the existing `tensor_filter_tensorflow_lite <https://github.com/nnstreamer/nnstreamer/blob/main/ext/nnstreamer/tensor_filter/tensor_filter_tensorflow_lite.cc>`_. 
Examples of using the Tensorflow-Lite framework can be found in `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.
    
Some properties like `neural network framework` and `model path` are required by user input when you are like to use ``tensor_filter_tensorflow_lite``.
But there is no need to pass the model meta information such as **input/output type and input/output dimension** because these properties in the model are
properly handled by the ``tensor_filter_tensorflow_lite``.

Here is an pipeline snippets for `tensor_filter` using Tensorflow-Lite framework. Please visit `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_ to get full examples.

.. prompt:: bash # auto

    ... tensor_converter ! \
    tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8 ! \
    ...

.. _neuron_framework:

Neuron Framework
^^^^^^^^^^^^^^^^
  
|IOT-YOCTO| designed a ``tensor_filter`` which supports :ref:`Neuron SDK <Neuron SDK>`. Users can be able to use ``tensor_filter_neuronsdk`` to create GStreamer streaming pipeline
that leverage Genio platform's powerful AI hardware accelerator. The source implementation of the ``tensor_filter_neuronsdk`` in |IOT-YOCTO| NNStreamer repository 
(``$BUILD_DIR/tmp/work/armv8a-poky-linux/nnstreamer/$PV/git/ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc``).

.. _tensor_filter_neuronsdk:

Different from Tensorflow-Lite framework, all the model properties `neural network framework`, `model path` also **input/output type and input/output dimension** are required by user input
when you are like to use ``tensor_filter_neuronsdk``. All the model information are hidden in the DLA file for the security concern, it's important that user should have
comprehensive realization for their own model.

Here is an pipeline snippets for `tensor_filter` using the Neuron SDK:

.. prompt:: bash # auto

    ...  tensor_converter ! \
    tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
    ...

.. note::

    The ``tensor_filter`` properties related to input/output type and dimension are as follows:

    - `inputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_: Type of each element of the input tensor.

    - `inputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L938>`_: Set channel first (NCHW) or channel last layout (NHWC) or None for input data.

    - `input <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L926>`_: Input tensor dimension from inner array, up to 4 dimensions.

    - `outputtype <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L955>`_: Type of each element of the output tensor.

    - `outputlayout <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L959>`_: Set channel first (NCHW) or channel last layout (NHWC) or None for output data.

    - `output <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L951>`_: Output tensor dimension from inner array, up to 4 dimensions.


    Get more details for tensor_filter from the `NNstreamer online document <https://nnstreamer.github.io/gst/nnstreamer/tensor_filter/README.html>`_ and the 
    `source code <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L934>`_.


NNStreamer Unit Test
--------------------

NNStreamer provides `gtest based test suite <https://github.com/nnstreamer/nnstreamer/tree/main/tests>`_ for common library and NNStreamer plugins. 
Run the unit tests by following command to gain an insight into the integration status of NNStreamer on |IOT-YOCTO|.

.. prompt:: bash # auto

    # cd /usr/bin/unittest-nnstreamer/
    # ssat
    ...
    ==================================================

    [PASSED] transform_typecast (37 passed among 39 cases)
    [PASSED] nnstreamer_filter_neuronsdk (8 passed among 8 cases)
    [PASSED] transform_dimchg (13 passed among 13 cases)
    [PASSED] nnstreamer_decoder_pose (3 passed among 3 cases)
    [PASSED] nnstreamer_decoder_boundingbox (15 passed among 15 cases)
    [PASSED] transform_clamp (10 passed among 10 cases)
    [PASSED] transform_stand (9 passed among 9 cases)
    [PASSED] transform_arithmetic (36 passed among 36 cases)
    [PASSED] nnstreamer_decoder (17 passed among 17 cases)
    [PASSED] nnstreamer_filter_custom (23 passed among 23 cases)
    [PASSED] transform_transpose (16 passed among 16 cases)
    [PASSED] nnstreamer_filter_tensorflow2_lite (31 passed among 31 cases)
    [PASSED] nnstreamer_repo_rnn (2 passed among 2 cases)
    [PASSED] nnstreamer_converter (32 passed among 32 cases)
    [PASSED] nnstreamer_repo_dynamicity (10 passed among 10 cases)
    [PASSED] nnstreamer_mux (84 passed among 84 cases)
    [PASSED] nnstreamer_split (21 passed among 21 cases)
    [PASSED] nnstreamer_repo (77 passed among 77 cases)
    [PASSED] nnstreamer_demux (43 passed among 43 cases)
    [PASSED] nnstreamer_filter_python3 (0 passed among 0 cases)
    [PASSED] nnstreamer_rate (17 passed among 17 cases)
    [PASSED] nnstreamer_repo_lstm (2 passed among 2 cases)
    ==================================================
    [PASSED] All Test Groups (23) Passed!
            TC Passed: 595 / Failed: 0 / Ignored: 2

Some test cases which marked with "Ignored" are not invoked because they didn't implement ``runTest.sh`` in its test directory that is required by ``ssat``.
Though the integration status can be acquired by it's own unit test execution binary.

Here takes ArmNN unit test as an example. 

.. prompt:: bash # auto

    # cd /usr/bin/unittest-nnstreamer/tests/
    # export NNSTREAMER_SOURCE_ROOT_PATH=/usr/bin/unittest-nnstreamer/
    # ./unittest_filter_armnn
    ...
    [==========] 13 tests from 1 test suite ran. (141 ms total)
    [  PASSED  ] 13 tests.


NNStreamer Pipeline Examples
----------------------------

|IOT-YOCTO| provides some python examples in ``/usr/bin/nnstreamer-demo/`` to demonstrate how to create a NNStreamer pipeline with different configuration on ``tensor_filters``
for various use cases.
The examples are modified from `NNStreamer-Example <https://github.com/nnstreamer/nnstreamer-example>`_.

.. csv-table:: Table Features of NNStreamer Examples
    :class: longtable
    :file: /_asset/tables/ml-nnstreamer-demo.csv
    :width: 65%
    :widths: 250 100 400

Each application could be run separately with it's own python script but here we strongly suggest that 
users run the application via Demo Runner ``run_nnstreamer_example.py``. Users can easily change the target
application by simply modifying argument rather than constructing complicated command.

The remaining part of this section, we are going to use ``run_nnstreamer_example.py`` to go through the demo process.
Use ``--help`` to  list all available options of it. 

.. prompt:: bash # auto

    # python3 run_nnstreamer_example.py --help
    usage: run_nnstreamer_example.py [-h] [--app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement,monocular_depth_estimation}]
                                 [--engine {neuronsdk,tflite,armnn}] [--img IMG] [--cam CAM] --cam_type {uvc,yuvsensor,rawsensor} [--width WIDTH] [--height HEIGHT] [--performance {0,1}]
                                 [--fullscreen {0,1}] [--throughput {0,1}] [--rot ROT]

    options:
    -h, --help            show this help message and exit
    --app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement,monocular_depth_estimation}
                        Choose a demo app to run. Default: image_classification
    --framework {neuronsdk,tflite}
                        Choose a framework to run the pipeline. Default: neuronsdk
    --engine {armnn,neuron_stable}
                        Choose a engine for tflite framework to run the pipeline.
                        If no engine is specified, the inference will run on CPU.
                        Note 2: neuron_stable is NOT available on Genio-350
				
    --img IMG           Input a image file path.
                        Example: /usr/bin/nnstreamer-demo/original.png
                        Note: This paramater is dedicated to low light enhancement app
    --cam CAM           Input a camera node id, ex: 130 .
                        Use 'v4l2-ctl --list-devices' query camera node id.
                        Example:
                        $ v4l2-ctl --list-devices
                            ...
                            C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
                            /dev/video130
                            /dev/video131
                            ...
                        Note: This paramater was designed for all the apps except low light enhancement app.
    --cam_type {uvc,yuvsensor,rawsensor}
                        Choose correct type of camera being used for the demo, ex: yuvsensor
                        Note: This paramater was designed for all the apps except low light enhancement app.
    --width WIDTH       Width for showing on display, ex: 640
    --height HEIGHT     Height for showing on display, ex: 480
    --performance {0,1} Enable to make CPU/GPU/APU run under performance mode, ex: 1
    --fullscreen {0,1}  Fullscreen preview.
                        1: Enable
                        0: Disable
                        Note: This paramater is for all the apps except low light enhancement app.
    --throughput {0,1}  Print throughput information.
                        1: Enable
                        0: Disable
    --rot ROT           Rotate the camera image by degrees, ex: 90
                        Note: This paramater is for all the apps except low light enhancement app.


Here goes some fundamental options:

- ``--framework``:

    |IOT-Yocto| supports ``tflite`` and ``neuronsdk``, the former is online inference path and the other is offline inference path.
    Offline inference path is **NOT** supported on Genio-350.
    Please find the details in :doc:`ML section </sw/yocto/ml-guide/ml-common>` 

- ``--engine``:

    Choose a engine for ``tflite`` framework to run the pipeline. It could be  ``armnn``, ``neuron_stable``.

    For every python demo script, there exists ``build_pipeline`` function that will create ``tensor_filter`` with different framework, engine
    and properties based on user setting. 
    
    .. important::
        
        The ``--engine`` flag is designed only for online inference path, ``tflite`` framework, which supports different hardware accelerator for model inference
        also has fallback mechanism. The offline inference path ``neuronsdk`` can only run the model inference on APU.
    
    Here are some examples for ``run_nnstreamer_example.py``:
    
    - ``--framework tflite`` or ``--framework tflite --engine cpu``:
    
        .. prompt:: text # auto
            
            # If no engine is specified, the inference will run on CPU
            tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8

        - ``--framework tflite --engine armnn`` :
        
            .. prompt:: text # auto
                
                tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:External,ExtDelegateLib:/usr/lib/libarmnnDelegate.so.29.0,ExtDelegateKeyVal:backends#GpuAcc 
    
        - ``--framework tflite --engine stable_delegate`` :
        
            .. prompt:: text # auto
                
                tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:Stable,StaDelegateSettingFile:/usr/share/label_image/stable_delegate_settings.json,ExtDelegateKeyVal:backends#GpuAcc 
                
    - ``--framework neuronsdk`` :
            
            The details for the framework were mentioned :ref:`here <neuron_framework>`.
            
            .. prompt:: text # auto
                
                tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1
    
        
- ``--cam``:  Input a camera node index.
- ``--performance``:

    Set performance mode for your platform. Select your current platform and set the performance mode for it. It could be 

    - ``--performance 0`` : Set the performance mode off
    - ``--performance 1`` : Set the performance mode on

    Performance mode will make the CPU, GPU, and APU running at the highest frequency and disable thermal throttling.


Camera-Input Application
^^^^^^^^^^^^^^^^^^^^^^^^

A v4l2-compatible device is required for acting as input source for the following demonstrations.

General Configuration
~~~~~~~~~~~~~~~~~~~~~

The examples in this section share some common configurations.
It means that users can only change the application option without modifying the shared settings. 

Here takes USB webcam as an example.
To get correct **camera node ID**, some methods are provided in the :ref:`Camera Section <v4l2-ctl>`, such as

.. prompt:: bash # auto

    # v4l2-ctl --list-devices
        ...
        C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
        /dev/video130
        /dev/video131
        ...

In this case, the camera node ID is ``/dev/video130``.

The common settings for the UVC-camera with enabling the Performance Mode are shown below:

.. prompt:: bash # auto
        
    # CAM_TYPE=uvc
    # CAMERA_NODE_ID=130
    # MODE=1
    
    
.. note::
   
   Users can choose raw sensor or YUV sensor as a input source by assigning the ``CAM_TYPE``, e.g. ``CAM_TYPE=rawsensor``, ``CAM_TYPE=yuvsensor``.


Image Classification
~~~~~~~~~~~~~~~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_image_classification.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_image_classification.py``
- Model: `mobilenet_v1_1.0_224_quant.tflite <https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_1.0_224_quant_and_labels.zip>`_
- Run example:

    1. Set the variable ``APP`` to Image Classification application:

		.. prompt:: bash # auto
        
			# APP=image_classification

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk**
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
                
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
        
        - **Execute on GPU through ArmNN delegate**: 

            .. prompt:: bash # auto
    
                # ENGINE=armnn

        - **Execute on MDLA through Stable delegate**: 
           
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate        

    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE

    
- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_image_classification` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-image-classification-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_image_classification.py`` with ``--cam uvc`` and ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue ! textoverlay name=tensor_res font-desc=Sans,24 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=224,height=224,format=RGB ! tensor_converter ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
        tensor_sink name=tensor_sink

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_image_classification.svg
        :width: 1000


Object Detection
~~~~~~~~~~~~~~~~

`ssd_mobilenet_v2_coco`
***********************
.. image:: /_asset/tools_nnstreamer_examples_object_detection.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_object_detection.py``
- Model: `ssd_mobilenet_v2_coco.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/ssd_mobilenet_v2_coco>`_
- Run example:

    1. Set the variable ``APP`` to Object Detection application:

		.. prompt:: bash # auto
        
			# APP=object_detection

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk** 
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
                
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
        
        - **Execute on GPU through ArmNN delegate**: 

            .. prompt:: bash # auto
    
                # ENGINE=armnn
					
        - **Execute on MDLA through Stable delegate**: 
		
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate        

    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE


- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_object_detection` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-object-detection-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_object_detections.py`` with ``--cam uvc`` and ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! v4l2convert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/ssd_mobilenet_v2_coco.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,91:1917:1 ! \
        tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/usr/bin/nnstreamer-demo/coco_labels_list.txt option3=/usr/bin/nnstreamer-demo/box_priors.txt option4=640:480 option5=300:300 ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection.svg
        :width: 1000

YOLOv5
******

.. image:: /_asset/tools_nnstreamer_examples_object_detection_yolov5.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5.py``
- Model: `yolov5s-int8.tflite <https://github.com/ultralytics/yolov5>`_
- Run example:

    1. Set the variable ``APP`` to Object Detection(YOLOv5s) application:

		.. prompt:: bash # auto
        
			# APP=object_detection_yolov5

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk**
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
          
        .. note::
        
            For offline inference, the YOLOv5 model is only supported on MDLA3.0 (Genio-700/510) , 
            which will got model-converting error on MDLA2.0 (Genio-1200) due to the unsupported operations.

            .. prompt:: bash # auto

                # ncc-tflite --arch mdla2.0 yolov5s-int8.tflite -o yolov5s-int8.dla --int8-to-uint8
                OP[123]: RESIZE_NEAREST_NEIGHBOR
                ├ MDLA: HalfPixelCenters is unsupported.
                ├ EDMA: unsupported operation
                OP[145]: RESIZE_NEAREST_NEIGHBOR
                ├ MDLA: HalfPixelCenters is unsupported.
                ├ EDMA: unsupported operation
                ERROR: Cannot find an execution plan because of unsupported operations
                ERROR: Fail to compile yolov5s-int8.tflite

            That's the reason why users get failure when running ``run_nnstreamer_example.py --app object_detection_yolov5`` on Genio-1200.
                
            .. prompt:: bash # auto
        
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app object_detection_yolov5 --cam_type uvc --cam 130 --framework neuronsdk --performance 1
                ...
                ERROR: Cannot open the file: /usr/bin/nnstreamer-demo/yolov5s-int8.dla
                ERROR: Cannot set a nullptr compiled network.
                ERROR: Cannot set compiled network.
                ERROR: Runtime loadNetworkFromFile fails.
                ERROR: Cannot initialize runtime pool.
                ...
            
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
        
        - **Execute on GPU through ArmNN delegate**: 

            .. prompt:: bash # auto
    
                # ENGINE=armnn
					
        - **Execute on MDLA through Stable delegate**: 
		
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate        

    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE


- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_object_detection_yolov5` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-object-detection_yolov5-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_object_detection_yolov5.py`` with ``--cam uvc`` and ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=320,height=320,format=RGB ! tensor_converter ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/yolov5s-int8.dla inputtype=uint8 input=3:320:320:1 outputtype=uint8 output=85:6300:1 ! \
        other/tensors,num_tensors=1,types=uint8,dimensions=85:6300:1:1,format=static ! \
        tensor_transform mode=arithmetic option=typecast:float32,add:-4.0,mul:0.0051498096 ! \
        tensor_decoder mode=bounding_boxes option1=yolov5 option2=/usr/bin/nnstreamer-demo/coco.txt option3=0 option4=640:480 option5=320:320 ! queue leaky=2 max-size-buffers=2 ! mix.
       
    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection_yolov5.svg
        :width: 1000

Pose Estimation
~~~~~~~~~~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_pose_estimation.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation.py``
- Model: `posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/pose_estimation>`_
- Run example:


    1. Set the variable ``APP`` to Pose Estimation application:

		.. prompt:: bash # auto
        
			# APP=pose_estimation

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk**
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
				
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
        
        - **Execute on GPU through ArmNN delegate**: 

            .. prompt:: bash # auto
    
                # ENGINE=armnn
					
        - **Execute on MDLA through Stable delegate**: 
		
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate        

    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE


- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_pose_estimation` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-pose-estimation-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph:

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_pose_estimation.py`` with ``--cam uvc`` and ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=257,height=257,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.dla inputtype=float32 input=3:257:257:1 outputtype=float32,float32,float32,float32 output=17:9:9:1,34:9:9:1,32:9:9:1,32:9:9:1 ! queue ! \
        tensor_decoder mode=pose_estimation option1=640:480 option2=257:257 option3=/usr/bin/nnstreamer-demo/point_labels.txt option4=heatmap-offset ! queue leaky=2 max-size-buffers=2 ! mix.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_pose_estimation.svg
        :width: 1000

Face Detection
~~~~~~~~~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_face_detection.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_face_detection.py``
- Model: `detect_face.tflite <http://ci.nnstreamer.ai/warehouse/nnmodels/>`_ 
- Run example:

    1. Set the variable ``APP`` to Face Detection application:

		.. prompt:: bash # auto
        
			# APP=face_detection

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk**
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
				
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
        
        - **Execute on GPU through ArmNN delegate**: 

            .. prompt:: bash # auto
    
                # ENGINE=armnn
					
        - **Execute on MDLA through Stable delegate**: 
		
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate        

    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_face_detection` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-face-detection-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph:

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_face_detection.py`` with ``--cam uvc`` and ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
        t_raw. ! queue leaky=2 max-size-buffers=10 ! videoconvert ! cairooverlay name=tensor_res ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
        t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/detect_face.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,2:1917:1 ! \
        tensor_sink name=res_face

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_face_detection.svg
        :width: 1000
        

Monocular Depth Estimation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_monocular_depth_estimation.png
    :width: 400

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_monocular_depth_estimation.py``
- Model: `midas.tflite <https://github.com/isl-org/MiDaS/releases/tag/v2_1>`_ 
- Run example:

    1. Set the variable ``APP`` to Face Detection application:

		.. prompt:: bash # auto
        
			# APP=monocular_depth_estimation

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk**
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
				
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
        
        - **Execute on GPU through ArmNN delegate**: 

            .. prompt:: bash # auto
    
                # ENGINE=armnn
	
        - **Execute on MDLA through Stable delegate**: 
		
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate
                
    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_monocular_depth_estimation` (UVC)
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-monocular-depth-estimation-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph:

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_monocular_depth_estimation.py`` with ``--cam uvc`` and ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        v4l2src name=src device=/dev/video5 ! video/x-raw,format=YUY2,width=640,height=480 num-buffers=300 ! videoconvert ! videoscale ! \
        video/x-raw,format=RGB,width=256,height=256 ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
        tensor_filter latency=1 framework=neuronsdk throughput=0 model=/usr/bin/nnstreamer-demo/midas.dla inputtype=float32 input=3:256:256:1 outputtype=float32 output=1:256:256:1 ! \
        appsink name=sink emit-signals=True max-buffers=1 drop=True sync=False

    .. image:: /_asset/tools_nnstreamer_examples_monocular_depth_estimation.svg
        :width: 1000

Image-Input Application
^^^^^^^^^^^^^^^^^^^^^^^

A Portable Network Graphics(.png) file is required for acting as input source for the following demonstrations.

General Configuration
~~~~~~~~~~~~~~~~~~~~~

The examples in this section share some common configurations.
It means that users can only change the application option without modifying the shared settings. 

The common settings for the input image with enabling the Performance Mode are shown below:

.. prompt:: bash # auto
        
    # IMAGE_PATH=/usr/bin/nnstreamer-demo/original.png
    # IMAGE_WIDTH=600
    # IMAGE_HEIGHT=400
    # MODE=1

Low Light Image Enhancement
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: /_asset/tools_nnstreamer_examples_low_light_image_enhancement.svg
    :width: 800

- Python script: ``/usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py``
- Model: `lite-model_zero-dce_1.tflite <https://github.com/nnsuite/testcases/raw/master/DeepLearningModels/tensorflow-lite/zero_dce_tflite>`_ 
- Run example:

    The example image (``/usr/bin/nnstreamer-demo/original.png``) was downloaded from `paperswithcode: <https://paperswithcode.com/dataset/lol>`_.

    1. Set the variable ``APP`` to Low Light Image Enhancement application:

		.. prompt:: bash # auto
        
			# APP=low_light_image_enhancement
            

    2. Choose the framework you like to leverage on:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # FRAMEWORK=tflite

        - **Offline inference on neuronsdk**
        
            .. prompt:: bash # auto
        
                # FRAMEWORK=neuronsdk
				
    3. Choose hardware engine for framework ``tensorflow-lite``:
    
        Please **skip** this step for framework ``neuronsdk``.
        
        - **Execute on CPU**: 
        
            The process will also run on CPU if no engine is specified.
            
            .. prompt:: bash # auto
    
                # ENGINE=cpu
					
        - **Execute on MDLA through Stable delegate**: 
		
            .. prompt:: bash # auto
    
                # ENGINE=stable_delegate        

    4. Run the command:
    
        - **Online inference on tensorflow-lite**
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --img $IMAGE_PATH --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --framework $FRAMEWORK --engine $ENGINE --performance $MODE

        - **Offline inference on neuronsdk**
        
            Actually this is as same as the above one but eliminating the ``--engine`` argument.
        
            .. prompt:: bash # auto
    
                # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app $APP --img $IMAGE_PATH --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --framework $FRAMEWORK --performance $MODE
                
        The light-enhancing image will be saved in the ``/usr/bin/nnstreamer-demo`` and named as ``low_light_enhancement_${FRAMEWORK}_${ENGINE}.png``. Users can be able to use the option: ``--export`` to name the output image. 


    .. note::
    
        You will fail to run ``nnstreamer-demo/run_nnstreamer_example.py --app low_light_image_enhancement`` with ``--engine armnn`` because operator ``SQUARE`` is not supported by Arm NN.

        .. prompt:: bash # auto

            # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app low_light_image_enhancement --img $IMAGE_PATH --framework tflite --engine armnn --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $MODE
            ...
            INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
            ERROR: Operator SQUARE [92] is not supported by armnn_delegate.
            ...

- Average inference time

    .. csv-table:: Average inference time of `nnstreamer_example_low_light_image_enhancement`
        :class: longtable
        :file: /_asset/tables/ml-nnstreamer-low-light-image-enhancement-latest-v25_0.csv
        :width: 65%
        :widths: 200 150 150 150 150 150

- Pipeline graph:

    Here is the GStreamer pipeline defined in the example ``nnstreamer_example_low_light_image_enhancement.py`` with ``--framework neuronsdk``.
    The pipeline graph is generated through the ``gst-report`` command from ``gst-instruments`` tool. The details can be found in :ref:`Pipeline Profiling <pipeline_profiling>`:

    .. prompt:: text # auto

        gst-launch-1.0 \
        filesrc location=/usr/bin/nnstreamer-demo/original.png ! pngdec ! videoscale ! videoconvert ! video/x-raw,width=600,height=400,format=RGB ! \
        tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:0,div:255.0 ! \
        tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/lite-model_zero-dce_1.dla inputtype=float32 input=3:600:400:1 outputtype=float32 output=3:600:400:1 ! \
        tensor_sink name=tensor_sink

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_low_light_image_enhacement.svg
        :width: 1000


Performance
-----------

Inference Time - NNStreamer::tensor_filter Invoke Time
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The inference time for each example was measured by the property ``latency`` provided by ``tensor_filter``.
`Here <https://github.com/nnstreamer/nnstreamer/blob/main/gst/nnstreamer/tensor_filter/tensor_filter_common.c#L982>`_ is the source code of property definition:

.. prompt:: c # auto

    Turn on performance profiling for the average latency over the recent 10 inferences in microseconds.
    Currently, this accepts either 0 (OFF) or 1 (ON). By default, it's set to 0 (OFF).

To enable the ``latency`` profiling for each example, users should modify the python script individually
with adding ``latency=1`` to the ``tensor_filter``'s property setting.

Take ``nnstreamer_example_image_classification.py`` as example: 

1. Edit python script: ``nnstreamer_example_image_classification.py``.

2. Search for ``tensor_filter`` and add ``latency=1`` after it.

    .. prompt:: text # auto

        if engine == 'neuronsdk':
            tensor = dla_converter(self.tflite_model, self.dla)
            cmd += f'tensor_filter latency=1 framework=neuronsdk model={self.dla} {tensor} ! '
        elif engine == 'tflite':
            cpu_cores = find_cpu_cores()
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=NumThreads:{cpu_cores} ! '
        elif engine == 'armnn':
            library = find_armnn_delegate_library()
            cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:External,ExtDelegateLib:{library},ExtDelegateKeyVal:backends#GpuAcc ! '

3. Save the python script.

4. Enable glib log level to ``all`` to show the debug messages:

    .. prompt:: bash # auto
    
        export G_MESSAGES_DEBUG=all


5. Run the example. You can be able to find the log similar to: ``Invoke took 2.537 ms``, which is regarded as the inference time.
    
    .. prompt:: bash # auto

        # CAM_TYPE=uvc
        # CAMERA_NODE_ID=130
        # MODE=1
        # FRAMEWORK=neuronsdk
    
        # python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app image_classification --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID --framework $FRAMEWORK --performance $MODE
        ...
        ...

        ** INFO: 03:16:01.589: [/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla] Invoke took 2.537 ms
        ...
        ...


NNStreamer Advanced Pipeline Examples
-------------------------------------

.. _pipeline_profiling:

Pipeline Profiling
^^^^^^^^^^^^^^^^^^

|IOT-YOCTO| provide `gst-instrument <https://github.com/kirushyk/gst-instruments>`_ as profiling tool 
for performance analysis and data flow inspection of the GStreamer pipeline.

Here goes two fundamental options:

- ``gst-top-1.0``: 

    It will show the performance report for each element in pipeline.

    .. prompt:: bash # auto
    
        # gst-top-1.0 \
          gst-launch-1.0 \
          v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw t_raw. ! queue leaky=2 max-size-buffers=10 ! \
        ...

        Got EOS from element "pipeline0".
        Execution ended after 0:00:10.221403924
        Setting pipeline to NULL ...
        Freeing pipeline ...
        ELEMENT                    %CPU   %TIME   TIME
        videoconvert0               13.8   55.3    1.41 s
        videoscale0                  3.7   14.9    379 ms
        tensortransform0             2.2    9.0    228 ms
        fps-display-text-overlay     2.0    8.1    207 ms
        tensordecoder0               0.7    2.8   71.9 ms
        tensorfilter0                0.6    2.3   59.5 ms
        ...

    Also save the statistics as a GstTrace file named ``gst-top.gsttrace``

    .. prompt:: bash # auto

        # ls -al *.gsttrace
        -rw-r--r-- 1 root root 11653120 Jan  4 05:23 gst-top.gsttrace

- ``gst-report``: 

    It will convert GstTrace file to performance graph in DOT format:

    .. prompt:: bash # auto

        # gst-report-1.0 --dot gst-top.gsttrace | dot -Tsvg > perf.svg


    The performance graph of ``nnstreamer_example_object_detection.py`` is shown as below. It shows CPU usage, time usage, and execution time among the elements. 
    Users can easily figure out the portion of occupied CPU resource of each element, also of the execution time.

    In this case, the ``tensor_transform`` consumed 56.9% of the total execution time because it processes the buffer data conversion with the CPU computation.

    .. image:: /_asset/tools_nnstreamer_examples_pipeline_object_detection.svg
        :width: 1000

.. note::

    Please refer to `NNstreamer online document: Profiling <https://nnstreamer.github.io/tools/profiling/README.html>`_ for details.