Demo App: NNStreamer

Overview 

NNStreamer is an open-source collection of GStreamer plugins that simplifies the integration of neural networks into multimedia pipelines. Samsung initially developed the project before transferring it to the LF AI & Data Foundation.

NNStreamer allows developers to:

Integrate neural network models into GStreamer pipelines efficiently.
Manage neural network filters and data streams within a unified framework.
Incorporate custom C/C++ or Python objects and various AI frameworks at runtime.

For comprehensive details, refer to the NNStreamer Official Documentation.

IoT Yocto includes a specialized tensor_filter subplugin designed for the Neuron SDK. Developers use tensor_filter_neuronsdk to build pipelines that leverage Genio hardware accelerators, such as the MDLA. The source implementation is located in the IoT Yocto NNStreamer tree at ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc.

The following figure shows the software stack for NNStreamer on IoT Yocto.

../../../../_images/tools_nnstreamer_software-stack.png

NNStreamer on IOT Yocto 

The machine learning software stack on IoT Yocto provides multiple backend and accelerator options. Developers can run inference with the online Neuron Stable Delegate on MediaTek’s AI Processing Unit (NPU).

Table 2. Software Stack on IoT Yocto
Software Stack	Backend	Genio 350	Genio 510	Genio 700	Genio 1200	Genio 520	Genio 720
Tensorflow-Lite CPU	CPU	V	V	V	V	V	V
Tensorflow-Lite + GPU delegate	GPU	V	V	V	V	V	V
Tensorflow-Lite + Neuron Stable Delegate	MDLA	X	V	V	V	V	V
Neuron SDK	MDLA	X	V	V	V	V	V
Onnxruntime + CPU Execution Provider	CPU	V	V	V	V	V	V
Onnxruntime + NPU Execution Provider	MDLA	X	X	X	X	V	V

NNStreamer::tensor_filter 

The NNStreamer plugin tensor_filter plays a central role in NNStreamer. It acts as a bridge between GStreamer data streams and neural network frameworks, such as TensorFlow Lite. It converts GStreamer buffers to the format expected by neural networks and executes model inference.

Like a typical GStreamer plugin, the gst-inspect-1.0 command shows the details of the tensor_filter element:

gst-inspect-1.0 tensor_filter
...
        Pad Templates:
    SINK template: 'sink'
        Availability: Always
        Capabilities:
        other/tensor
                framerate: [ 0/1, 2147483647/1 ]
        other/tensors
                    format: { (string)static, (string)flexible }
                framerate: [ 0/1, 2147483647/1 ]

    SRC template: 'src'
        Availability: Always
        Capabilities:
        other/tensor
                framerate: [ 0/1, 2147483647/1 ]
        other/tensors
                    format: { (string)static, (string)flexible }
                framerate: [ 0/1, 2147483647/1 ]

    Element has no clocking capabilities.
    Element has no URI handling capabilities.

    Pads:
    SINK: 'sink'
        Pad Template: 'sink'
    SRC: 'src'
        Pad Template: 'src'

    Element Properties:
    accelerator         : Set accelerator for the subplugin with format (true/false):(comma separated ACCELERATOR(s)). true/false determines if accelerator is to be used. list of accelerators determines the backend (ignored with false). Example, if GPU, NPU can be used but not CPU - true:npu,gpu,!cpu. The full list of accelerators can be found in nnstreamer_plugin_api_filter.h. Note that only a few subplugins support this property.
                            flags: readable, writable
                            String. Default: ""
    custom              : Custom properties for subplugins ?
                            flags: readable, writable
                            String. Default: ""
    framework           : Neural network framework
                            flags: readable, writable
                            String. Default: "auto"
    input               : Input tensor dimension from inner array, up to 4 dimensions ?
                            flags: readable, writable
                            String. Default: ""
    input-combination   : Select the input tensor(s) to invoke the models
                            flags: readable, writable
                            String. Default: ""
    inputlayout         : Set channel first (NCHW) or channel last layout (NHWC) or None for input data. Layout of the data can be any or NHWC or NCHW or none for now.
                            flags: readable, writable
                            String. Default: ""
    inputname           : The Name of Input Tensor
                            flags: readable, writable
                            String. Default: ""
    inputranks          : The Rank of the Input Tensor, which is separated with ',' in case of multiple Tensors
                            flags: readable
                            String. Default: ""
    inputtype           : Type of each element of the input tensor ?

...

TensorFlow Lite Framework 

Developers can construct GStreamer pipelines by using the existing tensor_filter_tensorflow_lite subplugin. Examples using the TensorFlow Lite framework are available in NNStreamer-Example.

When tensor_filter_tensorflow_lite is used, properties such as the framework (neural network framework) and model (model path) must be set. However, developers do not need to specify model metadata such as input/output type and input/output dimension, because tensor_filter_tensorflow_lite reads this information directly from the TFLite model file.

The following snippet shows a tensor_filter configured to use the TensorFlow Lite framework. For full pipeline examples, refer to NNStreamer-Example.

... tensor_converter ! \
tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8 ! \
...

Neuron Framework 

IoT Yocto provides a tensor_filter subplugin that supports Neuron SDK. Developers can use tensor_filter_neuronsdk to create GStreamer pipelines that leverage the Genio platform AI accelerators. The source implementation is located in the IoT Yocto NNStreamer repository:

$BUILD_DIR/tmp/work/armv8a-poky-linux/nnstreamer/$PV/git/ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc

In contrast to the TensorFlow Lite framework, all model-related properties, including the neural network framework, model path, input/output type, and input/output dimension, must be provided explicitly when using tensor_filter_neuronsdk. For security reasons, the model information is embedded in the DLA file and is not exposed by the runtime. Therefore, it is important that developers fully understand the input and output specifications of their models.

The following snippet shows a tensor_filter configured to use Neuron SDK:

...  tensor_converter ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
...

Note

The main tensor_filter properties related to tensor type and dimension are:

inputtype: Type of each element of the input tensor.
inputlayout: Channel-first (NCHW), channel-last (NHWC), or none for input data.
input: Input tensor dimension, up to 4 dimensions.
outputtype: Type of each element of the output tensor.
outputlayout: Channel-first (NCHW), channel-last (NHWC), or none for output data.
output: Output tensor dimension, up to 4 dimensions.

For more details, refer to the NNStreamer online documentation and the tensor_filter common source code.

NNStreamer Unit Test 

NNStreamer provides a gtest-based test suite for the common library and NNStreamer plugins. Running these unit tests helps verify the integration status of NNStreamer on IoT Yocto.

cd /usr/bin/unittest-nnstreamer/
ssat
...
==================================================

[PASSED] transform_typecast (37 passed among 39 cases)
[PASSED] nnstreamer_filter_neuronsdk (8 passed among 8 cases)
[PASSED] transform_dimchg (13 passed among 13 cases)
[PASSED] nnstreamer_decoder_pose (3 passed among 3 cases)
[PASSED] nnstreamer_decoder_boundingbox (15 passed among 15 cases)
[PASSED] transform_clamp (10 passed among 10 cases)
[PASSED] transform_stand (9 passed among 9 cases)
[PASSED] transform_arithmetic (36 passed among 36 cases)
[PASSED] nnstreamer_decoder (17 passed among 17 cases)
[PASSED] nnstreamer_filter_custom (23 passed among 23 cases)
[PASSED] transform_transpose (16 passed among 16 cases)
[PASSED] nnstreamer_filter_tensorflow2_lite (31 passed among 31 cases)
[PASSED] nnstreamer_repo_rnn (2 passed among 2 cases)
[PASSED] nnstreamer_converter (32 passed among 32 cases)
[PASSED] nnstreamer_repo_dynamicity (10 passed among 10 cases)
[PASSED] nnstreamer_mux (84 passed among 84 cases)
[PASSED] nnstreamer_split (21 passed among 21 cases)
[PASSED] nnstreamer_repo (77 passed among 77 cases)
[PASSED] nnstreamer_demux (43 passed among 43 cases)
[PASSED] nnstreamer_filter_python3 (0 passed among 0 cases)
[PASSED] nnstreamer_rate (17 passed among 17 cases)
[PASSED] nnstreamer_repo_lstm (2 passed among 2 cases)
==================================================
[PASSED] All Test Groups (23) Passed!
        TC Passed: 595 / Failed: 0 / Ignored: 2

Some test cases are marked as “Ignored” because they do not implement the runTest.sh script in their test directory, which is required by ssat. Even when ssat ignores a test group, the integration status can still be checked by running the individual unit test binary.

The following example shows how to run the Arm NN unit test (for reference):

cd /usr/bin/unittest-nnstreamer/tests/
export NNSTREAMER_SOURCE_ROOT_PATH=/usr/bin/unittest-nnstreamer/
./unittest_filter_armnn
...
[==========] 13 tests from 1 test suite ran. (141 ms total)
[  PASSED  ] 13 tests.

NNStreamer Pipeline Examples 

IoT Yocto provides several Python examples in /usr/bin/nnstreamer-demo/ to demonstrate how to build NNStreamer pipelines with different tensor_filter configurations for various use cases. These examples are adapted from NNStreamer-Example.

Table Features of NNStreamer Examples
Category	Input Source	Python script
Demo Runner		run_nnstreamer_example.py
Image Classification	Camera	nnstreamer_example_image_classification.py
Object Detection	Camera	nnstreamer_example_object_detection.py
Object Detection	Camera	nnstreamer_example_object_detection_yolov5.py
Pose Estimation	Camera	nnstreamer_example_pose_estimation.py
Face Detection	Camera	nnstreamer_example_face_detection.py
Monocular Depth Estimation	Camera	nnstreamer_example_monocular_depth_estimation.py
Image Enhancement	Image	nnstreamer_example_low_light_image_enhancement.py

Each application can be run directly via its own Python script. However, IoT Yocto strongly recommends launching them through the demo runner run_nnstreamer_example.py. The demo runner allows developers to switch between applications and frameworks by changing command-line arguments instead of manually constructing GStreamer commands.

The remainder of this section uses run_nnstreamer_example.py to walk through the demo flow. Use --help to list all available options:

python3 run_nnstreamer_example.py --help
usage: run_nnstreamer_example.py [-h] [--app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement,monocular_depth_estimation}]
                             [--engine {neuronsdk,neuron_stable}] [--img IMG] [--cam CAM] --cam_type {uvc,yuvsensor,rawsensor} [--width WIDTH] [--height HEIGHT] [--performance {0,1}]
                             [--fullscreen {0,1}] [--throughput {0,1}] [--rot ROT]

options:
-h, --help            show this help message and exit
--app {image_classification,object_detection,object_detection_yolov5,face_detection,pose_estimation,low_light_image_enhancement,monocular_depth_estimation}
                    Choose a demo app to run. Default: image_classification
--engine {neuronsdk,neuron_stable}
                    Choose a runtime engine to run the pipeline.
                    If no engine is specified, the inference will run on CPU by default.
                    Note: neuron_stable is NOT available on Genio-350

--img IMG           Input image file path.
                    Example: /usr/bin/nnstreamer-demo/original.png
                    Note: This parameter is dedicated to the low light enhancement app.
--cam CAM           Input camera node ID, for example: 130.
                    Use 'v4l2-ctl --list-devices' to query the camera node ID.
                    Example:
                    $ v4l2-ctl --list-devices
                        ...
                        C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
                        /dev/video130
                        /dev/video131
                        ...
                    Note: This parameter applies to all apps except the low light enhancement app.
--cam_type {uvc,yuvsensor,rawsensor}
                    Choose the camera type for the demo, for example: yuvsensor.
                    Note: This parameter applies to all apps except the low light enhancement app.
--width WIDTH       Width of the preview window, for example: 640
--height HEIGHT     Height of the preview window, for example: 480
--performance {0,1} Enable performance mode for CPU/GPU/APU, for example: 1
--fullscreen {0,1}  Fullscreen preview.
                    1: Enable
                    0: Disable
                    Note: This parameter applies to all apps except the low light enhancement app.
--throughput {0,1}  Print throughput information.
                    1: Enable
                    0: Disable
--rot ROT           Rotate the camera image by degrees, for example: 90
                    Note: This parameter applies to all apps except the low light enhancement app.

Here are some key options:

--engine:
Select the runtime engine used for inference. It can be:
- neuronsdk: Offline inference on APU with compiled DLA models.
- neuron_stable: Online inference path using the Neuron Stable Delegate.
For each Python demo script, a build_pipeline function constructs a tensor_filter element with the appropriate framework, engine, and properties based on the selected options.

Important

The Neuron Stable Delegate provides online inference path support and can route inference to different hardware accelerators, with a fallback mechanism. The offline inference path using neuronsdk runs compiled models directly on the APU.

The following examples show typical pipelines constructed by the demos:
- --engine cpu (implicit default when no engine is specified):
  If no hardware engine is specified, the inference runs on CPU tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8
- --engine neuron_stable (Neuron Stable Delegate):
  tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:Stable,StaDelegateSettingFile:/usr/share/label_image/stable_delegate_settings.json,ExtDelegateKeyVal:backends#GpuAcc
- --engine neuronsdk (offline Neuron SDK):
  The details of the framework are described in Neuron Framework.
  
  tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1
--cam:

Specifies the camera node index used as input.
--performance:
Sets the performance mode for the platform:
- --performance 0: Disable performance mode.
- --performance 1: Enable performance mode.
Performance mode drives CPU, GPU, and APU to their highest operating frequencies and disables thermal throttling.

Camera-Input Application 

A v4l2-compatible device is required as an input source for the following demonstrations.

General Configuration

The camera-based examples share common configuration parameters. Developers can switch applications by changing only the application option while keeping the shared settings.

The following example uses a USB webcam. Here uses v4l2-ctl o obtain the camera node ID.

v4l2-ctl --list-devices
    ...
    C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
    /dev/video130
    /dev/video131
    ...

In this case, the camera node ID is /dev/video130.

The common settings for a UVC camera with Performance Mode enabled are:

CAM_TYPE=uvc
CAMERA_NODE_ID=130
MODE=1

Note

Developers can also use a raw sensor or YUV sensor as the input source by assigning CAM_TYPE, for example CAM_TYPE=rawsensor or CAM_TYPE=yuvsensor.

Image Classification

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification.py
Model: mobilenet_v1_1.0_224_quant.tflite

Run example:

Set the variable APP to the Image Classification application:
APP=image_classification
Choose the runtime engine:
Online inference with Neuron Stable Delegate (if supported on the platform)
ENGINE=neuron_stable
Offline inference with Neuron SDK
ENGINE=neuronsdk
CPU-only inference
If no engine is set, the demo falls back to CPU execution.
unset ENGINE    # or ENGINE=cpu
Run the command:
Online inference with Neuron Stable Delegate
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuron_stable --performance $MODE
Offline inference with Neuron SDK
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuronsdk --performance $MODE

Average inference time

Average inference time of nnstreamer_example_image_classification (UVC)

CPU

ARMNN GPU

Neuron Stable

NeuronSDK

Genio-350

46.3

40

Not Supported

Not Supported

Genio-510

8.6

16.5

1.6

2.3

Genio-700

9.4

13

1.3

2.3

Genio-1200

7.3

9

1.8

2.5

Average inference time of **nnstreamer_example_image_classification** (UVC)
	CPU	ARMNN GPU	Neuron Stable	NeuronSDK
Genio-350	46.3	40	Not Supported	Not Supported
Genio-510	8.6	16.5	1.6	2.3
Genio-700	9.4	13	1.3	2.3
Genio-1200	7.3	9	1.8	2.5

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_image_classification.py when --cam uvc and --engine neuronsdk are used. The pipeline graph is generated using the gst-report command from the gst-instruments tool. For more information, see Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue ! textoverlay name=tensor_res font-desc=Sans,24 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=224,height=224,format=RGB ! tensor_converter ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
tensor_sink name=tensor_sink

../../../../_images/tools_nnstreamer_examples_pipeline_image_classification.svg

Object Detection

ssd_mobilenet_v2_coco

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection.py
Model: ssd_mobilenet_v2_coco.tflite

Run example:

Set the variable APP to the Object Detection application:
APP=object_detection
Choose the runtime engine:
Online inference with Neuron Stable Delegate
ENGINE=neuron_stable
Offline inference with Neuron SDK
ENGINE=neuronsdk
CPU-only inference
unset ENGINE    # or ENGINE=cpu
Run the command:
Online inference with Neuron Stable Delegate
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuron_stable --performance $MODE
Offline inference with Neuron SDK
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuronsdk --performance $MODE

Average inference time

Average inference time of nnstreamer_example_object_detection (UVC)

CPU

ARMNN GPU

Neuron Steble Delegate

NeuronSDK

Genio-350

579

194.5

Not supported

Not supported

Genio-510

175

79

21.5

22.5

Genio-700

164

60

15.5

16.7

Genio-1200

121

39

12.3

13

Average inference time of **nnstreamer_example_object_detection** (UVC)
	CPU	ARMNN GPU	Neuron Steble Delegate	NeuronSDK
Genio-350	579	194.5	Not supported	Not supported
Genio-510	175	79	21.5	22.5
Genio-700	164	60	15.5	16.7
Genio-1200	121	39	12.3	13

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_object_detections.py with --cam uvc and --engine neuronsdk. The pipeline graph is generated using the gst-report command from gst-instruments. For more details, see Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! v4l2convert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/ssd_mobilenet_v2_coco.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,91:1917:1 ! \
tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/usr/bin/nnstreamer-demo/coco_labels_list.txt option3=/usr/bin/nnstreamer-demo/box_priors.txt option4=640:480 option5=300:300 ! queue leaky=2 max-size-buffers=2 ! mix.

../../../../_images/tools_nnstreamer_examples_pipeline_object_detection.svg

YOLOv5

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5.py
Model: yolov5s-int8.tflite

Run example:

Set the variable APP to the Object Detection (YOLOv5s) application:
APP=object_detection_yolov5

Choose the runtime engine:

Online inference with Neuron Stable Delegate
ENGINE=neuron_stable
Offline inference with Neuron SDK
ENGINE=neuronsdk
CPU-only inference
unset ENGINE # or ENGINE=cpu

Note

For offline inference, the YOLOv5 model is only supported on MDLA3.0 (Genio-700/510). On MDLA2.0 (Genio-1200), model conversion fails because certain operations are not supported.

ncc-tflite --arch mdla2.0 yolov5s-int8.tflite -o yolov5s-int8.dla --int8-to-uint8
OP[123]: RESIZE_NEAREST_NEIGHBOR
├ MDLA: HalfPixelCenters is unsupported.
├ EDMA: unsupported operation
OP[145]: RESIZE_NEAREST_NEIGHBOR
├ MDLA: HalfPixelCenters is unsupported.
├ EDMA: unsupported operation
ERROR: Cannot find an execution plan because of unsupported operations
ERROR: Fail to compile yolov5s-int8.tflite

As a result, running run_nnstreamer_example.py --app object_detection_yolov5 with --engine neuronsdk fails on Genio-1200:

python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py --app object_detection_yolov5 --cam_type uvc --cam 130 --engine neuronsdk --performance 1
...
ERROR: Cannot open the file: /usr/bin/nnstreamer-demo/yolov5s-int8.dla
ERROR: Cannot set a nullptr compiled network.
ERROR: Cannot set compiled network.
ERROR: Runtime loadNetworkFromFile fails.
ERROR: Cannot initialize runtime pool.
...

Run the command:

Online inference with Neuron Stable Delegate

python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuron_stable --performance $MODE

Offline inference with Neuron SDK

python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuronsdk --performance $MODE

Average inference time

Average inference time of nnstreamer_example_object_detection_yolov5 (UVC)

CPU

ARMNN GPU

Neuron Stable

NeuronSDK

Genio-350

295

140

Not supported

Not supported

Genio-510

55

46.5

5.2

5.9

Genio-700

57

37.5

3.6

4.9

Genio-1200

41

24.5

27.9

Not supported

Average inference time of **nnstreamer_example_object_detection_yolov5** (UVC)
	CPU	ARMNN GPU	Neuron Stable	NeuronSDK
Genio-350	295	140	Not supported	Not supported
Genio-510	55	46.5	5.2	5.9
Genio-700	57	37.5	3.6	4.9
Genio-1200	41	24.5	27.9	Not supported

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_object_detection_yolov5.py when --cam uvc and --engine neuronsdk are used. The pipeline graph is generated using gst-report from gst-instruments. For more details, see Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=320,height=320,format=RGB ! tensor_converter ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/yolov5s-int8.dla inputtype=uint8 input=3:320:320:1 outputtype=uint8 output=85:6300:1 ! \
other/tensors,num_tensors=1,types=uint8,dimensions=85:6300:1:1,format=static ! \
tensor_transform mode=arithmetic option=typecast:float32,add:-4.0,mul:0.0051498096 ! \
tensor_decoder mode=bounding_boxes option1=yolov5 option2=/usr/bin/nnstreamer-demo/coco.txt option3=0 option4=640:480 option5=320:320 ! queue leaky=2 max-size-buffers=2 ! mix.

../../../../_images/tools_nnstreamer_examples_pipeline_object_detection_yolov5.svg

Pose Estimation

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation.py
Model: posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite

Run example:

Set the variable APP to the Pose Estimation application:
APP=pose_estimation
Choose the runtime engine:
Online inference with Neuron Stable Delegate
ENGINE=neuron_stable
Offline inference with Neuron SDK
ENGINE=neuronsdk
CPU-only inference
unset ENGINE    # or ENGINE=cpu
Run the command:
Online inference with Neuron Stable Delegate
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuron_stable --performance $MODE
Offline inference with Neuron SDK
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuronsdk --performance $MODE

Average inference time

Average inference time of nnstreamer_example_pose_estimation (UVC)

CPU

ARMNN GPU

Neuron Stable

NeuronSDK

Genio-350

180

115.3

Not supported

Not supported

Genio-510

45

34.5

6.9

8.3

Genio-700

50

25

5.2

6.5

Genio-1200

42

16.5

5.7

6.5

Average inference time of **nnstreamer_example_pose_estimation** (UVC)
	CPU	ARMNN GPU	Neuron Stable	NeuronSDK
Genio-350	180	115.3	Not supported	Not supported
Genio-510	45	34.5	6.9	8.3
Genio-700	50	25	5.2	6.5
Genio-1200	42	16.5	5.7	6.5

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_pose_estimation.py with --cam uvc and --engine neuronsdk. The pipeline graph is generated using gst-report from gst-instruments. For details, see Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=257,height=257,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.dla inputtype=float32 input=3:257:257:1 outputtype=float32,float32,float32,float32 output=17:9:9:1,34:9:9:1,32:9:9:1,32:9:9:1 ! queue ! \
tensor_decoder mode=pose_estimation option1=640:480 option2=257:257 option3=/usr/bin/nnstreamer-demo/point_labels.txt option4=heatmap-offset ! queue leaky=2 max-size-buffers=2 ! mix.

../../../../_images/tools_nnstreamer_examples_pipeline_pose_estimation.svg

Face Detection

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_face_detection.py
Model: detect_face.tflite

Run example:

Set the variable APP to the Face Detection application:
APP=face_detection
Choose the runtime engine:
Online inference with Neuron Stable Delegate
ENGINE=neuron_stable
Offline inference with Neuron SDK
ENGINE=neuronsdk
CPU-only inference
unset ENGINE    # or ENGINE=cpu
Run the command:
Online inference with Neuron Stable Delegate
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuron_stable --performance $MODE
Offline inference with Neuron SDK
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuronsdk --performance $MODE

Average inference time

Average inference time of nnstreamer_example_face_detection (UVC)

CPU

ARMNN GPU

Neuron Stable

NeuronSDK

Genio-350

237

113.2

Not support

Not support

Genio-510

83

41.8

11.2

12.5

Genio-700

60

31

7.8

9.1

Genio-1200

52

23.4

5.9

6.8

Average inference time of **nnstreamer_example_face_detection** (UVC)
	CPU	ARMNN GPU	Neuron Stable	NeuronSDK
Genio-350	237	113.2	Not support	Not support
Genio-510	83	41.8	11.2	12.5
Genio-700	60	31	7.8	9.1
Genio-1200	52	23.4	5.9	6.8

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_face_detection.py with --cam uvc and --engine neuronsdk. The pipeline graph is generated using gst-report from gst-instruments. For details, see Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! videoconvert ! cairooverlay name=tensor_res ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/detect_face.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,2:1917:1 ! \
tensor_sink name=res_face

../../../../_images/tools_nnstreamer_examples_pipeline_face_detection.svg

Monocular Depth Estimation

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_monocular_depth_estimation.py
Model: midas.tflite

Run example:

Set the variable APP to the Monocular Depth Estimation application:
APP=monocular_depth_estimation
Choose the runtime engine:
Online inference with Neuron Stable Delegate
ENGINE=neuron_stable
Offline inference with Neuron SDK
ENGINE=neuronsdk
CPU-only inference
unset ENGINE    # or ENGINE=cpu
Run the command:
Online inference with Neuron Stable Delegate
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuron_stable --performance $MODE
Offline inference with Neuron SDK
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app $APP --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine neuronsdk --performance $MODE

Average inference time

Average inference time of nnstreamer_example_monocular_depth_estimation (UVC)

CPU

ARMNN GPU

Neuron Stable

NeuronSDK

Genio-350

701

350

Not supported

Not supported

Genio-510

240

120

22.7

23.2

Genio-700

158

87

16.3

16.5

Genio-1200

144

62

33.3

Not supported

Average inference time of **nnstreamer_example_monocular_depth_estimation** (UVC)
	CPU	ARMNN GPU	Neuron Stable	NeuronSDK
Genio-350	701	350	Not supported	Not supported
Genio-510	240	120	22.7	23.2
Genio-700	158	87	16.3	16.5
Genio-1200	144	62	33.3	Not supported

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_monocular_depth_estimation.py when --cam uvc and --engine neuronsdk are used. The pipeline graph is generated using gst-report from gst-instruments. For more information, see Pipeline Profiling:
gst-launch-1.0 \
v4l2src name=src device=/dev/video5 ! video/x-raw,format=YUY2,width=640,height=480 num-buffers=300 ! videoconvert ! videoscale ! \
video/x-raw,format=RGB,width=256,height=256 ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
tensor_filter latency=1 framework=neuronsdk throughput=0 model=/usr/bin/nnstreamer-demo/midas.dla inputtype=float32 input=3:256:256:1 outputtype=float32 output=1:256:256:1 ! \
appsink name=sink emit-signals=True max-buffers=1 drop=True sync=False

Image-Input Application 

A Portable Network Graphics (PNG) file is required as the input source for the following demonstrations.

General Configuration

The image-based examples share a common configuration pattern. Developers can switch the application while keeping the base configuration unchanged.

The following settings enable Performance Mode and configure the input image:

IMAGE_PATH=/usr.bin/nnstreamer-demo/original.png
IMAGE_WIDTH=600
IMAGE_HEIGHT=400
MODE=1

Low Light Image Enhancement

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py
Model: lite-model_zero-dce_1.tflite
Run example:
The example image (/usr/bin/nnstreamer-demo/original.png) is downloaded from paperswithcode (LOL dataset).
1. Set the variable APP to the Low Light Image Enhancement application:
  APP=low_light_image_enhancement
2. Choose the runtime engine:
  Online inference with Neuron Stable Delegate
  
  ENGINE=neuron_stable
  
  Offline inference with Neuron SDK
  
  ENGINE=neuronsdk
  
  CPU-only inference
  
  unset ENGINE # or ENGINE=cpu
3. Run the command:
  Online inference with Neuron Stable Delegate
  
  python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \ --app $APP --img $IMAGE_PATH --width $IMAGE_WIDTH --height $IMAGE_HEIGHT \ --engine neuron_stable --performance $MODE
  
  Offline inference with Neuron SDK
  
  python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \ --app $APP --img $IMAGE_PATH --width $IMAGE_WIDTH --height $IMAGE_HEIGHT \ --engine neuronsdk --performance $MODE
  
  The enhanced image is saved under /usr/bin/nnstreamer-demo and named as low_light_enhancement_${ENGINE}.png. Developers can also use the --export option in the script to customize the output filename.

Average inference time

Average inference time of nnstreamer_example_low_light_image_enhancement

CPU

ARMNN GPU

Neuron Stable

NeuronSDK

Genio-350

3636

Not supported

Not supported

Not supported

Genio-510

1215

Not supported

229

101

Genio-700

765

Not supported

147

74

Genio-1200

644

Not supported

144

79

Average inference time of **nnstreamer_example_low_light_image_enhancement**
	CPU	ARMNN GPU	Neuron Stable	NeuronSDK
Genio-350	3636	Not supported	Not supported	Not supported
Genio-510	1215	Not supported	229	101
Genio-700	765	Not supported	147	74
Genio-1200	644	Not supported	144	79

Pipeline graph

The following GStreamer pipeline is defined in nnstreamer_example_low_light_image_enhancement.py when --engine neuronsdk is used. The pipeline graph is generated using gst-report from gst-instruments. For more details, see Pipeline Profiling:
gst-launch-1.0 \
filesrc location=/usr/bin/nnstreamer-demo/original.png ! pngdec ! videoscale ! videoconvert ! video/x-raw,width=600,height=400,format=RGB ! \
tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:0,div:255.0 ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/lite-model_zero-dce_1.dla inputtype=float32 input=3:600:400:1 outputtype=float32 output=3:600:400:1 ! \
tensor_sink name=tensor_sink

Performance 

Inference Time – `tensor_filter` Invoke Time 

The inference time for each example is measured using the latency property of tensor_filter. The property is defined in the tensor_filter_common.c source code:

Turn on performance profiling for the average latency over the recent 10 inferences in microseconds.
Currently, this accepts either 0 (OFF) or 1 (ON). By default, it's set to 0 (OFF).

To enable latency profiling, modify each Python example and add latency=1 to the tensor_filter properties.

The following example uses nnstreamer_example_image_classification.py:

Edit the script nnstreamer_example_image_classification.py.

Locate tensor_filter and add latency=1:

if engine == 'neuronsdk':
    tensor = dla_converter(self.tflite_model, self.dla)
    cmd += f'tensor_filter latency=1 framework=neuronsdk model={self.dla} {tensor} ! '
elif engine == 'neuron_stable':
    cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:Stable,StaDelegateSettingFile:/usr/share/label_image/stable_delegate_settings.json ! '
else:
    # CPU-only fallback
    cpu_cores = find_cpu_cores()
    cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=NumThreads:{cpu_cores} ! '

Save the script.
Set the glib log level to all to print the timing messages:
export G_MESSAGES_DEBUG=all

Run the example. The log output includes entries similar to Invoke took 2.537 ms, which indicate the measured inference time.

CAM_TYPE=uvc
CAMERA_NODE_ID=130
MODE=1
ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/run_nnstreamer_example.py \
    --app image_classification --cam_type $CAM_TYPE --cam $CAMERA_NODE_ID \
    --engine $ENGINE --performance $MODE
...
** INFO: 03:16:01.589: [/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla] Invoke took 2.537 ms
...

NNStreamer Advanced Pipeline Examples 

Pipeline Profiling 

IoT Yocto includes gst-instruments as a profiling tool for performance analysis and data flow inspection of GStreamer pipelines.

The two main utilities are:

gst-top-1.0:

Shows a performance report for each element in a pipeline.

gst-top-1.0 \
  gst-launch-1.0 \
  v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw t_raw. ! queue leaky=2 max-size-buffers=10 ! \
...

Got EOS from element "pipeline0".
Execution ended after 0:00:10.221403924
Setting pipeline to NULL ...
Freeing pipeline ...
ELEMENT                    %CPU   %TIME   TIME
videoconvert0               13.8   55.3    1.41 s
videoscale0                  3.7   14.9    379 ms
tensortransform0             2.2    9.0    228 ms
fps-display-text-overlay     2.0    8.1    207 ms
tensordecoder0               0.7    2.8   71.9 ms
tensorfilter0                0.6    2.3   59.5 ms
...

The tool also saves the statistics to a GstTrace file named gst-top.gsttrace:

ls -al *.gsttrace
-rw-r--r-- 1 root root 11653120 Jan  4 05:23 gst-top.gsttrace

gst-report:
Converts a GstTrace file into a performance graph in DOT format:
gst-report-1.0 --dot gst-top.gsttrace | dot -Tsvg > perf.svg
The figure below shows the performance graph for nnstreamer_example_object_detection.py. It displays CPU usage, time usage, and execution time for each element. This makes it easy to identify the elements that consume most of the CPU or execution time.

In this example, tensor_transform consumes 56.9% of the total execution time because it performs buffer data conversion on the CPU.

Note

For more information, refer to NNStreamer online documentation: Profiling.