NNStreamer

Overview 

NNStreamer is a set of Gstreamer plugins that allow Gstreamer developers to adopt neural network models easily and efficiently and neural network developers to manage neural network pipelines and their filters easily and efficiently.

NNStreamer provides the new Gstreamer stream data type and a set of Gstreamer elements (plugins) to construct media stream pipeline with neural network models. It is well documented through its online document site and it supports well-known neural network frameworks including Tensorflow, Tensorflow-lite, Caffe2, PyTorch, OpenVINO and ARMNN.

Users may include custom C functions, C++ objects, or Python objects as well as such frameworks as neural network filters of a pipeline in run-time and also add and integrate support for such frameworks or hardware AI accelerators in run-time, which may exist as independent plugin binaries.

NNStreamer::tensor_filter 

tensor_filter is the main element of the whole NNStreamer project. This connects gstreamer data stream with neural network frameworks such as Tensorflow-lite. Like a typical gstreamer plugin, you can use gst-inspect-1.0 to view all plugin information of the tensor_filter:

gst-inpsect-1.0 tensor_filter
...
        Pad Templates:
    SINK template: 'sink'
        Availability: Always
        Capabilities:
        other/tensor
                framerate: [ 0/1, 2147483647/1 ]
        other/tensors
                    format: { (string)static, (string)flexible }
                framerate: [ 0/1, 2147483647/1 ]

    SRC template: 'src'
        Availability: Always
        Capabilities:
        other/tensor
                framerate: [ 0/1, 2147483647/1 ]
        other/tensors
                    format: { (string)static, (string)flexible }
                framerate: [ 0/1, 2147483647/1 ]

    Element has no clocking capabilities.
    Element has no URI handling capabilities.

    Pads:
    SINK: 'sink'
        Pad Template: 'sink'
    SRC: 'src'
        Pad Template: 'src'

    Element Properties:
    accelerator         : Set accelerator for the subplugin with format (true/false):(comma separated ACCELERATOR(s)). true/false determines if accelerator is to be used. list of accelerators determines the backend (ignored with false). Example, if GPU, NPU can be used but not CPU - true:npu,gpu,!cpu. The full list of accelerators can be found in nnstreamer_plugin_api_filter.h. Note that only a few subplugins support this property.
                            flags: readable, writable
                            String. Default: ""
    custom              : Custom properties for subplugins ?
                            flags: readable, writable
                            String. Default: ""
    framework           : Neural network framework
                            flags: readable, writable
                            String. Default: "auto"
    input               : Input tensor dimension from inner array, up to 4 dimensions ?
                            flags: readable, writable
                            String. Default: ""
    input-combination   : Select the input tensor(s) to invoke the models
                            flags: readable, writable
                            String. Default: ""
    inputlayout         : Set channel first (NCHW) or channel last layout (NHWC) or None for input data. Layout of the data can be any or NHWC or NCHW or none for now.
                            flags: readable, writable
                            String. Default: ""
    inputname           : The Name of Input Tensor
                            flags: readable, writable
                            String. Default: ""
    inputranks          : The Rank of the Input Tensor, which is separated with ',' in case of multiple Tensors
                            flags: readable
                            String. Default: ""
    inputtype           : Type of each element of the input tensor ?

...

On IoT Yocto, Genio platforms provide different machine learning software stacks for the developer:

Table 2. Software Stack on Board
Software Stack	Backend	Genio 350-EVK	Genio 1200-EVK	Genio 700-EVK
Tensorflow-Lite	CPU	V	V	V
Tensorflow-Lite + GPU delegate	GPU	V	V	V
Tensorflow-Lite + ARMNN Delegate	GPU, CPU	V	V	V
Tensorflow-Lite + NNAPI Delegate	VPU	V	X	X
Neuron SDK	MDLA, VPU	X	V	V

For the Tensorflow-Lite framework:
Users can directly construct gstreamer media stream pipeline using the existing tensor_filter_tensorflow_lite. You can find many examples of using the Tensorflow-Lite framework in NNStreamer-Example.

When using tensor_filter_tensorflow_lite, you should specify neural network framework, model path. For the model meta information, you do not need to specify the properties, such as in/out type and dimension because these properties can automatically get from the tensorflow-lite model in tensor_filter_tensorflow_lite.

Here is an example of the launch line using the Tensorflow-Lite framework. More launch line examples here: NNStreamer-Example.
... tensor_converter ! \ tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8 ! \ ...

Neuron SDK:

IoT Yocto have provided a new tensor_filter for Neuron SDK. Users can use tensor_filter_neuronsdk to create gstreamer media stream pipeline and leverage Genio platform’s powerful AI hardware accelerator, such as MDLA. You can find the implementation of the tensor_filter_neuronsdk in IoT Yocto NNStreamer source ($BUILD_DIR/tmp/work/armv8a-poky-linux/nnstreamer/$PV/git/ext/nnstreamer/tensor_filter/tensor_filter_neuronsdk.cc).

When using tensor_filter_neuronsdk, you should specify neural network framework, model path. For the model meta information, you have to specify the properties, such as in/out type and dimension because these properties can not get from the dla file in tensor_filter_neuronsdk, dla file does not provide the interfaces to obtain this information.

Here is an example of the launch line using the Neuron SDK:
...  tensor_converter ! \
    tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
    ...
Note

The tensor_filter properties related to in/out type and dimension are as follows:

inputtype: Type of each element of the input tensor.

inputlayout: Set channel first (NCHW) or channel last layout (NHWC) or None for input data.

input: Input tensor dimension from inner array, up to 4 dimensions.

outputtype: Type of each element of the output tensor.

outputlayout: Set channel first (NCHW) or channel last layout (NHWC) or None for output data.

output: Output tensor dimension from inner array, up to 4 dimensions.

You can also find more detailed descriptions of tensor_filter from the NNstreamer online document and the source code.

NNStreamer Unit Test 

NNStreamer provides gtest based test cases for common library and nnstreamer plugins. You can run the unit tests using the following command to get insights into the integration status of nnstreamer on Yocto.

cd /usr/bin/unittest-nnstreamer/
ssat
...
==================================================

[PASSED] transform_typecast (37 passed among 39 cases)
[PASSED] nnstreamer_filter_neuronsdk (8 passed among 8 cases)
[PASSED] transform_dimchg (13 passed among 13 cases)
[PASSED] nnstreamer_decoder_pose (3 passed among 3 cases)
[PASSED] nnstreamer_decoder_boundingbox (15 passed among 15 cases)
[PASSED] transform_clamp (10 passed among 10 cases)
[PASSED] transform_stand (9 passed among 9 cases)
[PASSED] transform_arithmetic (36 passed among 36 cases)
[PASSED] nnstreamer_decoder (17 passed among 17 cases)
[PASSED] nnstreamer_filter_custom (23 passed among 23 cases)
[PASSED] transform_transpose (16 passed among 16 cases)
[PASSED] nnstreamer_filter_tensorflow2_lite (31 passed among 31 cases)
[PASSED] nnstreamer_repo_rnn (2 passed among 2 cases)
[PASSED] nnstreamer_converter (32 passed among 32 cases)
[PASSED] nnstreamer_repo_dynamicity (10 passed among 10 cases)
[PASSED] nnstreamer_mux (84 passed among 84 cases)
[PASSED] nnstreamer_split (21 passed among 21 cases)
[PASSED] nnstreamer_repo (77 passed among 77 cases)
[PASSED] nnstreamer_demux (43 passed among 43 cases)
[PASSED] nnstreamer_filter_python3 (0 passed among 0 cases)
[PASSED] nnstreamer_rate (17 passed among 17 cases)
[PASSED] nnstreamer_repo_lstm (2 passed among 2 cases)
==================================================
[PASSED] All Test Groups (23) Passed!
        TC Passed: 595 / Failed: 0 / Ignored: 2

Some test cases are not invoked via command: ssat because they lack the implementation of runTest.sh, with ArmNN unit tests being one such example. However, you can confirm the integration status of ArmNN with NNStreamer by directly running /usr/bin/unittest-nnstreamer/tests/unittest_filter_armnn.

cd /usr/bin/unittest-nnstreamer/tests/
export NNSTREAMER_SOURCE_ROOT_PATH=/usr/bin/unittest-nnstreamer/
./unittest_filter_armnn
...
[==========] 13 tests from 1 test suite ran. (141 ms total)
[  PASSED  ] 13 tests.

NNStreamer Pipeline Examples 

IoT Yocto provides follownig examples in python in /usr/bin/nnstreamer-demo/ to demonstrate how to create a NNStreamer pipeline with different tensor_filters for different use cases and implementation options. Those examples are adapted from NNStreamer-Example.

Table Features of NNStreamer Examples
Python script	Category
nnstreamer_example_image_classification_uvc.py	Image classification
nnstreamer_example_object_detection_uvc.py	Object detection
nnstreamer_example_object_detection_yolov5_uvc.py	Object detection
nnstreamer_example_pose_estimation_uvc.py	Pose estimation
nnstreamer_example_face_detection_uvc.py	Face detection
nnstreamer_example_low_light_image_enhancement_uvc.py	Image enhancement

To run these examples, you will need a USB Video Class (UVC) camera. You can use a USB webcam as a v4l2 video device and operate through GStreamer. To find out the USB camera, you can refer to command in USB Camera, such as

ls -l /sys/class/video4linux
...
lrwxrwxrwx 1 root root 0 Oct  8 01:29 video5 -> ../../devices/platform/soc/11201000.usb/11200000.xhci/usb1/1-1/1-1.3/1-1.3:1.0/video4linux/video5
...

From the above command, we can find that /dev/video5 is the camera node.

For each example in python, you can use --help to find all options of it.

python3 nnstreamer_example_image_classification_uvc.py --help
usage: nnstreamer_example_image_classification_uvc.py [-h] [--engine {nnapi,tflite,armnn}] [--cam CAM] [--width WIDTH] [--height HEIGHT] [--performance {NA,G1200,G700,G350}] [--fullscreen {0,1}]
options:
  -h, --help            show this help message and exit
  --engine {neuronsdk,tflite,armnn}
                    Choose a backends to inference. Default: tflite
  ---cam CAM        Input a camera node id, ex: 130 .
                    Use 'v4l2-ctl --list-devices' query camera node id.
                    Example:
                    $ v4l2-ctl --list-devices
                      ...
                       C922 Pro Stream Webcam (usb-11290000.xhci-1.2):
                       /dev/video130
                       /dev/video131
                      ...
  --width WIDTH         Input video display width, ex: 640
  --height HEIGHT       Input video display height, ex: 480
  --performance {NA,G1200,G700,G350}
                        Select platform and make CPU/GPU/APU run under performance mode, ex: G1200
  --fullscreen {0,1}    Fullscreen preview.
                        1: Enable
                        0: Disable

Below are the main options:

--engine:
Choose one from the backends supported by the platform to use. It could be neuronsdk, tflite, armnn or nnapi.
You can find the function: build_pipeline in python script. This function will create the tensor_filter with different framework and properties based on the backend you choose.

Take nnstreamer_example_image_classification_uvc.py as an example:

--engine tflite :

tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=NumThreads:8

--engine armnn :

tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:External,ExtDelegateLib:/usr/lib/libarmnnDelegate.so.28.0,ExtDelegateKeyVal:backends#GpuAcc

--engine neuronsdk :

tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1
As mentioned earlier, unlike using Tensorflow-Lite framework, when using neuronsdk, for the model meta information, you have to specify the properties, such as in/out type and dimension because these properties can not get from the dla file in tensor_filter_neuronsdk, dla file does not provide the interfaces to obtain this information.

You can refer to the build_pipeline in python script to know how we set these properties:

inputtype: Type of each element of the input tensor.

inputlayout: Set channel first (NCHW) or channel last layout (NHWC) or None for input data.

input: Input tensor dimension from inner array, up to 4 dimensions.

outputtype: Type of each element of the output tensor.

outputlayout: Set channel first (NCHW) or channel last layout (NHWC) or None for output data.

output: Output tensor dimension from inner array, up to 4 dimensions.

--engine nnapi :

tensor_filter framework=tensorflow-lite model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.tflite custom=Delegate:External,ExtDelegateLib:/usr/lib/nnapi_external_delegate.so

Note

--engine nnapi is only available on Genio-350.
--cam: Input a camera node id.
--performance:
Set performance mode for your platform. Select your current platform and set the performance mode for it. It could be
- --performance G1200 : Set the performance mode for Genio-1200
- --performance G700 : Set the performance mode for Genio-700
- --performance G350 : Set the performance mode for Genio-350
Performance mode will make the CPU, GPU, and APU running at the highest frequency and disable thermal throttling.

Note

In the following examples, we all use the camera node: /dev/video5 and run on Genio-700 platform as an example. So, the options we use to run example are --cam 5 --performance G700.

Before running example, set global variables for camera node and platform

CAMERA_NODE_ID=5
PLATFORM=G700

Image Classification 

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification_uvc.py
Model: mobilenet_v1_1.0_224_quant.tflite

Run example:

Execute on MDLA by neuronsdk:

ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on CPU:

ENGINE=tflite
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on GPU by ArmNN delegate:

ENGINE=armnn
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on VPU by nnapi:

Note

--engine nnapi is only available on Genio-350.

ENGINE=nnapi
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Average inference time

Average inference time of nnstreamer_example_image_classification_uvc

CPU

ARMNN GPU

MDLA

NNAPI(VPU)

Genio-1200

7.3

9

2.5

Not support

Genio-700

9.4

13

2.3

Not support

Genio-350

46.3

40

Not support

508

Average inference time of nnstreamer_example_image_classification_uvc
	CPU	ARMNN GPU	MDLA	NNAPI(VPU)
Genio-1200	7.3	9	2.5	Not support
Genio-700	9.4	13	2.3	Not support
Genio-350	46.3	40	Not support	508

Pipeline graph:

Below is the gstreamer command and pipeline graph constructed in the example: nnstreamer_example_image_classification_uvc.py using --engine neuronsdk. The pipeline graph is generated through the gst-report command of gst-instruments. Detailed command can be found in Pipeline Profiling:
gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue ! textoverlay name=tensor_res font-desc=Sans,24 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=224,height=224,format=RGB ! tensor_converter ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla inputtype=uint8 input=3:224:224:1 outputtype=uint8 output=1001:1 ! \
tensor_sink name=tensor_sink

Object Detection 

ssd_mobilenet_v2_coco 

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_uvc.py
Model: ssd_mobilenet_v2_coco.tflite

Run example:

Execute on MDLA by neuronsdk:

ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on CPU:

ENGINE=tflite
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on GPU by ArmNN delegate:

ENGINE=armnn
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Average inference time

Average inference time of nnstreamer_example_object_detection_uvc

CPU

ARMNN GPU

MDLA

NNAPI(VPU)

Genio-1200

121

39

13

Not support

Genio-700

164

60

16.7

Not support

Genio-350

579

194.5

Not support

Not support

Average inference time of nnstreamer_example_object_detection_uvc
	CPU	ARMNN GPU	MDLA	NNAPI(VPU)
Genio-1200	121	39	13	Not support
Genio-700	164	60	16.7	Not support
Genio-350	579	194.5	Not support	Not support

Pipeline graph:

Below is the gstreamer command and pipeline graph constructed in the example: nnstreamer_example_object_detection_uvc.py using --engine neuronsdk. The pipeline graph is generated through the gst-report command of gst-instruments. Detailed command can be found in Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! v4l2convert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/ssd_mobilenet_v2_coco.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,91:1917:1 ! \
tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/usr/bin/nnstreamer-demo/coco_labels_list.txt option3=/usr/bin/nnstreamer-demo/box_priors.txt option4=640:480 option5=300:300 ! queue leaky=2 max-size-buffers=2 ! mix.

../_images/tools_nnstreamer_examples_pipeline_object_detection.svg

yolov5 

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5_uvc.py
Model: yolov5s-int8.tflite

Run example:

Execute on MDLA by neuronsdk:

ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Note

The yolov5 model is only supported by Genio-700, MDLA3.0 , not supported by Genio-1200, MDLA2.0.

On Genio-1200, the model cannot be compiled into dla file by ncc-tflite due to unsupported operation.

ncc-tflite --arch mdla2.0,tflite_cpu yolov5s-int8.tflite -o yolov5s-int8.dla --int8-to-uint8
OP[123]: RESIZE_NEAREST_NEIGHBOR
├ MDLA: HalfPixelCenters is unsupported.
├ TFLITE_CPU: Only support ResizeBilinear
├ EDMA: unsupported operation
OP[145]: RESIZE_NEAREST_NEIGHBOR
├ MDLA: HalfPixelCenters is unsupported.
├ TFLITE_CPU: Only support ResizeBilinear
├ EDMA: unsupported operation
ERROR: Cannot find an execution plan because of unsupported operations
ERROR: Fail to compile yolov5s-int8.tflite

So you will fail to run nnstreamer-demo/nnstreamer_example_object_detection_yolov5_uvc.py on Genio-120.

python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5_uvc.py --cam 5 --engine neuronsdk --performance G1200
...
ERROR: Cannot open the file: /usr/bin/nnstreamer-demo/yolov5s-int8.dla
ERROR: Cannot set a nullptr compiled network.
ERROR: Cannot set compiled network.
ERROR: Runtime loadNetworkFromFile fails.
ERROR: Cannot initialize runtime pool.
...

Execute on CPU:

ENGINE=tflite
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on GPU by ArmNN delegate:

ENGINE=armnn
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_object_detection_yolov5_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Average inference time

Average inference time of nnstreamer_example_object_detection_yolov5_uvc

CPU

ARMNN GPU

MDLA

NNAPI(VPU)

Genio-1200

41

24.5

Not support

Not support

Genio-700

57

37.5

4.9

Not support

Genio-350

295

140

Not support

Not support

Average inference time of nnstreamer_example_object_detection_yolov5_uvc
	CPU	ARMNN GPU	MDLA	NNAPI(VPU)
Genio-1200	41	24.5	Not support	Not support
Genio-700	57	37.5	4.9	Not support
Genio-350	295	140	Not support	Not support

Pipeline graph:

Below is the gstreamer command and pipeline graph constructed in the example: nnstreamer_example_object_detection_yolov5_uvc.py using --engine neuronsdk. The pipeline graph is generated through the gst-report command of gst-instruments. Detailed command can be found in Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=320,height=320,format=RGB ! tensor_converter ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/yolov5s-int8.dla inputtype=uint8 input=3:320:320:1 outputtype=uint8 output=85:6300:1 ! \
other/tensors,num_tensors=1,types=uint8,dimensions=85:6300:1:1,format=static ! \
tensor_transform mode=arithmetic option=typecast:float32,add:-4.0,mul:0.0051498096 ! \
tensor_decoder mode=bounding_boxes option1=yolov5 option2=/usr/bin/nnstreamer-demo/coco.txt option3=0 option4=640:480 option5=320:320 ! queue leaky=2 max-size-buffers=2 ! mix.

../_images/tools_nnstreamer_examples_pipeline_object_detection_yolov5.svg

Pose Estimation 

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation_uvc.py
Model: posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite

Run example:

Execute on MDLA by neuronsdk:

ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on CPU:

ENGINE=tflite
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on GPU by ArmNN delegate:

ENGINE=armnn
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_pose_estimation_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Average inference time

Average inference time of nnstreamer_example_pose_estimation_uvc

CPU

ARMNN GPU

MDLA

NNAPI(VPU)

Genio-1200

42

16.5

6.5

Not support

Genio-700

50

25

6.5

Not support

Genio-350

180

115.3

Not support

Not support

Average inference time of nnstreamer_example_pose_estimation_uvc
	CPU	ARMNN GPU	MDLA	NNAPI(VPU)
Genio-1200	42	16.5	6.5	Not support
Genio-700	50	25	6.5	Not support
Genio-350	180	115.3	Not support	Not support

Pipeline graph:

Below is the gstreamer command and pipeline graph constructed in the example: nnstreamer_example_pose_estimation_uvc.py using --engine neuronsdk. The pipeline graph is generated through the gst-report command of gst-instruments. Detailed command can be found in Pipeline Profiling:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=257,height=257,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! queue ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.dla inputtype=float32 input=3:257:257:1 outputtype=float32,float32,float32,float32 output=17:9:9:1,34:9:9:1,32:9:9:1,32:9:9:1 ! queue ! \
tensor_decoder mode=pose_estimation option1=640:480 option2=257:257 option3=/usr/bin/nnstreamer-demo/point_labels.txt option4=heatmap-offset ! queue leaky=2 max-size-buffers=2 ! mix.

../_images/tools_nnstreamer_examples_pipeline_pose_estimation.svg

Face Detection 

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_face_detection_uvc.py
Model: detect_face.tflite

Run example:

Execute on MDLA by neuronsdk:

ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_face_detection_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on CPU:

ENGINE=tflite
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_face_detection_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Execute on GPU by ArmNN delegate:

ENGINE=armnn
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_face_detection_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM

Average inference time

Average inference time of nnstreamer_example_face_detection_uvc

CPU

ARMNN GPU

MDLA

NNAPI(VPU)

Genio-1200

52

23.4

6.8

Not support

Genio-700

60

31

9.1

Not support

Genio-350

237

113.2

Not support

Not support

Average inference time of nnstreamer_example_face_detection_uvc
	CPU	ARMNN GPU	MDLA	NNAPI(VPU)
Genio-1200	52	23.4	6.8	Not support
Genio-700	60	31	9.1	Not support
Genio-350	237	113.2	Not support	Not support

Pipeline graph:

gst-launch-1.0 \
v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw \
t_raw. ! queue leaky=2 max-size-buffers=10 ! videoconvert ! cairooverlay name=tensor_res ! fpsdisplaysink sync=false video-sink="waylandsink sync=false fullscreen=0" \
t_raw. ! queue leaky=2 max-size-buffers=2 ! videoconvert ! videoscale ! video/x-raw,width=300,height=300,format=RGB ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/detect_face.dla inputtype=float32 input=3:300:300:1 outputtype=float32,float32 output=4:1:1917:1,2:1917:1 ! \
tensor_sink name=res_face

../_images/tools_nnstreamer_examples_pipeline_face_detection.svg

Low Light Image Enhancement 

Python script: /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py
Model: lite-model_zero-dce_1.tflite

Run example:

This example does not read frames from a USB camera but inputs a PNG image and its width and height through option: --img, --width and --height. We have prepared a low-light photo (/usr/bin/nnstreamer-demo/original.png) as an example, which was downloaded from this link:.

The enhanced image will be stored in the path: /usr/bin/nnstreamer-demo and named as low_light_enhancement_${backend}.png, you can also use the option: --export to name the enhanced image.

Below is all options of this example:

python3 /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py --help
usage: nnstreamer_example_low_light_image_enhancement.py [-h] [--engine {neuronsdk,tflite,armnn}] [--img IMG] [--export EXPORT] [--width WIDTH] [--height HEIGHT] [--performance {NA,G1200,G700,G350}]

options:
-h, --help            show this help message and exit
--engine {neuronsdk,tflite,armnn}
                        Choose a backends to inference. Default: neuronsdk
--img IMG             Input a image file path .
                        Example: /usr/bin/nnstreamer-demo/original.png
--export EXPORT       Input a filename for the saved png image
                        Example: low_light_enhancement
--width WIDTH         Input image file width, ex: 600
--height HEIGHT       Input image file height, ex: 400
--performance {NA,G1200,G700,G350}
                        Select platform and make CPU/GPU/APU run under performance mode, ex: G1200

Before running example, set global variables for input image

IMAGE=/usr/bin/nnstreamer-demo/original.png
IMAGE_WIDTH=600
IMAGE_HEIGHT=400

Execute on MDLA by neuronsdk:

ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py --img $IMAGE --engine $ENGINE --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $PLATFORM

Execute on CPU:

ENGINE=tflite
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py --img $IMAGE --engine $ENGINE --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $PLATFORM

Execute on GPU by ArmNN delegate:

ENGINE=armnn
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py --img $IMAGE --engine $ENGINE --width $IMAGE_WIDTH --height $IMAGE_HEIGHT --performance $PLATFORM

Note

You will fail to run nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py with --engine armnn because operator SQUARE is not supported by Arm NN.

python3 /usr/bin/nnstreamer-demo/nnstreamer_example_low_light_image_enhancement.py --img /usr/bin/nnstreamer-demo/original.png --engine armnn --width 600 --height 400 --performance G700
...
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
ERROR: Operator SQUARE [92] is not supported by armnn_delegate.
...

Average inference time

Average inference time of nnstreamer_example_low_light_image_enhancement

CPU

ARMNN GPU

MDLA

NNAPI

Genio-1200

644

Not support

79

Not support

Genio-700

765

Not support

74

Not support

Genio-350

3636

Not support

Not support

Not support

Average inference time of nnstreamer_example_low_light_image_enhancement
	CPU	ARMNN GPU	MDLA	NNAPI
Genio-1200	644	Not support	79	Not support
Genio-700	765	Not support	74	Not support
Genio-350	3636	Not support	Not support	Not support

Pipeline graph:

Below is the gstreamer command and pipeline graph constructed in the example: nnstreamer_example_pose_estimation_uvc.py using --engine neuronsdk. The pipeline graph is generated through the gst-report command of gst-instruments. Detailed command can be found in Pipeline Profiling:
gst-launch-1.0 \
filesrc location=/usr/bin/nnstreamer-demo/original.png ! pngdec ! videoscale ! videoconvert ! video/x-raw,width=600,height=400,format=RGB ! \
tensor_converter ! tensor_transform mode=arithmetic option=typecast:float32,add:0,div:255.0 ! \
tensor_filter framework=neuronsdk model=/usr/bin/nnstreamer-demo/lite-model_zero-dce_1.dla inputtype=float32 input=3:600:400:1 outputtype=float32 output=3:600:400:1 ! \
tensor_sink name=tensor_sink

Performance 

NNStreamer::tensor_filter Invoke Time 

By default, the NNStreamer does not show tensor_filter invoke time (inference time) on the screen, but we can find this information by enabling tensor_filter property: latency.

According to the source code of tensor_filter, the definition of the property: latency is

Turn on performance profiling for the average latency over the recent 10 inferences in microseconds.
Currently, this accepts either 0 (OFF) or 1 (ON).

For enabling latency, currently, we have to modify the python script directly to add the property: latency=1 to the tensor_filter. Take nnstreamer_example_image_classification_uvc.py as example:

Step.1: Open python script: nnstreamer_example_image_classification_uvc.py

Step.2: Search for tensor_filter and add latency=1 after it.

if engine == 'neuronsdk':
    tensor = dla_converter(self.tflite_model, self.dla)
    cmd += f'tensor_filter framework=neuronsdk model={self.dla} {tensor} ! '
elif engine == 'tflite':
    cpu_cores = find_cpu_cores()
    cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=NumThreads:{cpu_cores} ! '
elif engine == 'armnn':
    library = find_armnn_delegate_library()
    cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:External,ExtDelegateLib:{library},ExtDelegateKeyVal:backends#GpuAcc ! '
elif engine == 'nnapi':
    library = find_armnn_delegate_library()
    cmd += f'tensor_filter latency=1 framework=tensorflow-lite model={self.tflite_model} custom=Delegate:External,ExtDelegateLib:/usr/lib/nnapi_external_delegate.so ! '

Step.3: Then save python script.
Step.4: Enable glib log by set global variable:
export G_MESSAGES_DEBUG=all

Step.4: Run the example, then you can find the following log: Invoke took 2.537 ms, which is the inference time.

CAMERA_NODE_ID=5
PLATFORM=G700
ENGINE=neuronsdk
python3 /usr/bin/nnstreamer-demo/nnstreamer_example_image_classification_uvc.py --cam $CAMERA_NODE_ID --engine $ENGINE --performance $PLATFORM
...
...

** INFO: 03:16:01.589: [/usr/bin/nnstreamer-demo/mobilenet_v1_1.0_224_quant.dla] Invoke took 2.537 ms
...
...

Pipeline Profiling 

In the NNstreamer online document: Profiling, nnstreamer recommends users to use NNShark or gst-instrument for performance analysis of the pipeline. For now, NNShark is not available on IoT Yocto, but gst-instrument is already included in the IoT Yocto rity-demo-image.

gst-instrument is set of performance profiling and data flow inspection tools for GStreamer pipelines. It provides:

gst-top-1.0:

Displays performance report for each element in piepline.

gst-top-1.0 \
  gst-launch-1.0 \
  v4l2src name=src device=/dev/video5 io-mode=mmap num-buffers=300 ! video/x-raw,width=640,height=480,format=YUY2 ! tee name=t_raw t_raw. ! queue leaky=2 max-size-buffers=10 ! \
...

Got EOS from element "pipeline0".
Execution ended after 0:00:10.221403924
Setting pipeline to NULL ...
Freeing pipeline ...
ELEMENT                    %CPU   %TIME   TIME
videoconvert0               13.8   55.3    1.41 s
videoscale0                  3.7   14.9    379 ms
tensortransform0             2.2    9.0    228 ms
fps-display-text-overlay     2.0    8.1    207 ms
tensordecoder0               0.7    2.8   71.9 ms
tensorfilter0                0.6    2.3   59.5 ms
...

And save performance data as a file called gst-top.gsttrace

ls -al *.gsttrace
-rw-r--r-- 1 root root 11653120 Jan  4 05:23 gst-top.gsttrace

gst-report:
Generate performance graph in DOT format:
gst-report-1.0 --dot gst-top.gsttrace | dot -Tsvg > perf.svg
Below is the performance graph of nnstreamer_example_object_detection_uvc.py. It shows CPU usage, time usage, and execution time among the elements. We can easily find who spends CPU resource mostly, who spends more time to execution.

For example, as shown in the following figure, tensor_transform consumed 56.9% of the total execution time because tensor_transform processes the conversion of buffer data using the CPU.