Video Codec

Note

Cmd operations and test results presented in this chapter are based on the IoT Yocto v22.0 and Genio 350-EVK.

Video Processing Overview

On IoT Yocto, video encoder, decoder, and format conversion hardware provide the V4L2 interface to userspace programs. GStreamer is integrated to provide wrapper plugins over the V4L2 interface and to assist in setting up video processing pipelines.

Example: Video Playback Using GStreamer

The following examples use GStreamer v4l2h264dec plug-in for hardware-accelerated video decoding. The v4l2convert plug-in is mandatory. This is explained in the sections below.

gst-launch-1.0 -v filesrc location=<your-video-path> ! parsebin ! v4l2h264dec ! v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! waylandsink

Note

The V4L2 video decoder assumes that one bitstream buffer must contains a complete frame data. The default input bitstream buffer size of GStreamer is 2MB. It might be some playback issues on high bitrate video (e.g. 4K 60Mbps) if 2MB is not enough for whole frame data. Applications should handle the buffer allocation by themself.

Example: Multi-Display Video Playback Using GStreamer

In case of dual and triple display systems, the displays act as a contigous video plane. One can make use of glvideomixer to play multiple videos on different sections of this “video plane”.

For example, if we use HDMI + DP, we can play two videos on the two displays as follows:

  1. Check supported resolutions:

For HDMI:

cat /sys/class/drm/card0-HDMI-A-1/modes

For DP:

cat /sys/class/drm/card0-DP-1/modes

If we are connected to two 4k monitors, the first value in supported modes will be 3840x2160. For this case, we can use glvideomixer as follows:

gst-launch-1.0 -v \
  glvideomixer name=mix background=0 \
          sink_1::xpos=0 sink_1::ypos=0 sink_1::width=3840 sink_1::height=2160 \
          sink_2::xpos=3840 sink_2::ypos=0 sink_2::width=3840 sink_2::height=2160 \
      ! queue ! fpsdisplaysink "video-sink=glimagesink rotate-method=0 render-rectangle=<0,0,7680,2160>" text-overlay=false \
  filesrc location=4k30_1.mp4 \
      ! queue ! parsebin ! queue ! v4l2h264dec capture-io-mode=dmabuf ! queue ! v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! video/x-raw,width=3840,height=2160,format=BGRA,pixel-aspect-ratio=1 \
      ! queue ! mix.sink_1 \
  filesrc location=4k30_2.mp4 \
      ! queue ! parsebin ! queue ! v4l2h264dec capture-io-mode=dmabuf ! queue ! v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! video/x-raw,width=3840,height=2160,format=BGRA,pixel-aspect-ratio=1 \
      ! queue ! mix.sink_2

Here, sink_1::xpos and sink_1::ypos are the starting coordinate of the first video. These would always be (0,0) unless we want the first video to be displayed at an offset.

sink_1::width and sink_1::height are the window sizes for the first video. If we want to display a 4k video on the first 4k monitor, we can set these as 3840x2160.

sink_2::xpos and sink_2::ypos are the starting coordinates for the second video. We define an offset of 3840 for the xpos of the second video to horizontally stack it to the first video.

sink_2::width and sink_2::height are the window sizes for the second video.

glimagesink renders video frames to a drawable on a local or remote display using OpenGL, which supports the memory::GLMemory memory type. rotate-method and render-rectangle are the rotation and resizing options of glimagesink.

The size of the entire window would be 7680x2160 for two 4k videos which is the input to glimagesink. We then provide two mp4 4k video files as source to the sinks.

We can similarily stack a third sink for a triple display configuration.

Note

The concept of displaying multiple videos using glvideomixer is not limited to multi-display systems. For a single display with a size of 1280x720, we can display two videos of 640x720 side by side..

Note

For improved performance, the input frames of glvideomixer should be converted to an RGB-based format to optimize GPU utilization. Additionally, do not change the memory type from memory::GLMemory that is used between glvideomixer and glimagesink. Otherwise, it will involve video texture download and upload instructions.

Example: Video Encoding Using GStreamer

The following examples use GStreamer v4l2h264enc plug-in for hardware-accelerated video encoding.

gst-launch-1.0 -v videotestsrc num-buffers=300 ! queue ! video/x-raw,framerate=30/1,width=1920,height=1080,format=NV12 ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=out_1920x1080.mp4

Note

The V4L2 video encoder assumes that output bitstream buffer is big enough for a complete frame data. The encoder will return error when buffer full. Applications should handle the buffer allocation by themself.

The GStreamer framework provides software-based or V4L2 hardware-accelerated video processing. To see the list of V4L2 video codecs available on GStreamer, use the following command:

gst-inspect-1.0 | grep v4l2.*
video4linux2:  v4l2src: Video (video4linux2) Source
video4linux2:  v4l2sink: Video (video4linux2) Sink
video4linux2:  v4l2radio: Radio (video4linux2) Tuner
video4linux2:  v4l2deviceprovider (GstDeviceProviderFactory)
video4linux2:  v4l2convert: V4L2 Video Converter
video4linux2:  v4l2mpeg4dec: V4L2 MPEG4 Decoder
video4linux2:  v4l2video0mpeg4dec: V4L2 MPEG4 Decoder
video4linux2:  v4l2h264dec: V4L2 H264 Decoder
video4linux2:  v4l2h265dec: V4L2 H265 Decoder
video4linux2:  v4l2vp8dec: V4L2 VP8 Decoder
video4linux2:  v4l2vp9dec: V4L2 VP9 Decoder
video4linux2:  v4l2h264enc: V4L2 H.264 Encoder

The Colorimetry Issue of v4l2convert

Sometimes, the GStreamer decoding pipeline will fail due to not supporting “colorimetry”.

gst-launch-1.0 -v filesrc location=/mnt/out-320x240-nv12.avi ! parsebin ! v4l2h264dec ! v4l2convert ! waylandsink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
/GstPipeline:pipeline0/GstParseBin:parsebin0/GstTypeFindElement:typefind.GstPad:src: caps = video/x-msvideo
/GstPipeline:pipeline0/GstParseBin:parsebin0/GstTypeFindElement:typefind.GstPad:src: caps = NULL
/GstPipeline:pipeline0/GstParseBin:parsebin0/GstH264Parse:h264parse0.GstPad:sink: caps = video/x-h264, variant=(string)itu, framerate=(fraction)30/1, width=(int)320, height=(int)240
/GstPipeline:pipeline0/GstParseBin:parsebin0/GstH264Parse:h264parse0.GstPad:src: caps = video/x-h264, variant=(string)itu, framerate=(fraction)30/1, width=(int)320, height=(int)240, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, colorimetry=(string)2:4:16:3, parsed=(boolean)true, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)baseline, level=(string)1.3
ERROR: from element /GstPipeline:pipeline0/GstParseBin:parsebin0/GstAviDemux:avidemux0: Internal data stream error.
Additional debug info:
../gst-plugins-good-1.20.3/gst/avi/gstavidemux.c(5798): gst_avi_demux_loop (): /GstPipeline:pipeline0/GstParseBin:parsebin0/GstAviDemux:avidemux0:
streaming stopped, reason not-negotiated (-4)
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

This is a known issue with the GStreamer v4l2convert element regarding colorimetry. The GStreamer v4l2convert ended up reduced to a “well-known” set of colorspace, but then whenever you do something that isn’t in the subset, in this case, like 2:4:16:3 (reduced range, BT601, BT601, BT470BG), it fails to negotiate.

You can use a caps setter to workaround.

gst-launch-1.0 -v filesrc location=/mnt/out-320x240-nv12.avi ! parsebin ! capssetter replace=true caps="video/x-h264, variant=(string)itu, framerate=(fraction)30/1, width=(int)320, height=(int)240, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, colorimetry=(string)bt601, parsed=(boolean)true, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)constrained-baseline, level=(string)1.3" ! v4l2h264dec ! v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! waylandsink

Sometimes, the GStreamer pipe got negotiation fail due to the different colorimetry requirements of upstream and downstream elements of v4l2convert.

For example, in the below camera case, the v4l2h264enc requires bt709 colorimetry but v4l2src output video frames only with bt601 colorimetry. It will fail to establish a connection due to that the GStreamer v4l2convert plugin cannot do colorimetry conversion.

gst-launch-1.0 -v v4l2src device="/dev/video5" ! video/x-raw,width=3840,height=2160,format=UYVY ! v4l2convert output-io-mode=dmabuf-import ! \
v4l2h264enc ! queue ! video/x-h264 ! h264parse ! v4l2h264dec ! autovideosink
...
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Got context from element 'autovideosink0': gst.gl.GLDisplay=context, gst.gl.GLDisplay=(GstGLDisplay)"\(GstGLDisplayWayland\)\ gldisplaywayland0";
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
ERROR: from element /GstPipeline:pipeline0/v4l2convert:v4l2convert0: Device '/dev/video2' does not support bt709 colorimetry
Additional debug info:
../gst-plugins-good-1.20.5/sys/v4l2/gstv4l2object.c(4234): gst_v4l2_object_set_format_full (): /GstPipeline:pipeline0/v4l2convert:v4l2convert0:
Device wants 2:4:5:4 colorimetry
Execution ended after 0:00:05.204833693
Setting pipeline to NULL ...
ERROR: from element /GstPipeline:pipeline0/v4l2convert:v4l2convert0: Device '/dev/video2' does not support bt709 colorimetry
Additional debug info:
../gst-plugins-good-1.20.5/sys/v4l2/gstv4l2object.c(4234): gst_v4l2_object_set_format_full (): /GstPipeline:pipeline0/v4l2convert:v4l2convert0:
Device wants 2:4:5:4 colorimetry
Freeing pipeline ...

Here is a capssetter workaround to fix this issue:

gst-launch-1.0 -v v4l2src device="/dev/video5" ! video/x-raw,width=3840,height=2160,format=UYVY ! v4l2convert output-io-mode=dmabuf-import ! \
`capssetter caps=\"video/x-raw,colorimetry=bt601\"` ! v4l2h264enc ! queue ! video/x-h264 ! h264parse ! v4l2h264dec ! autovideosink

Video Codec Devices and V4L2 Interface

The hardware video decoder and encoder support V4L2 API in IoT Yocto. To check V4L2 devices in the console, run the following commands:

ls -l /sys/class/video4linux/
lrwxrwxrwx 1 root root 0 Sep 20 10:43 video0 -> ../../devices/platform/soc/16000000.codec/video4linux/video0
lrwxrwxrwx 1 root root 0 Sep 20 10:43 video1 -> ../../devices/platform/soc/17020000.codec/video4linux/video1
lrwxrwxrwx 1 root root 0 Sep 20 10:43 video2 -> ../../devices/platform/soc/14004000.mdp_rdma0/video4linux/video2

Another utility to enumerate the v4l2 devices is v4l2-sysfs-path:

v4l2-sysfs-path
Video device: video2
Video device: video0
Video device: video1
Alsa playback device(s): hw:0,0 hw:0,1

You can also use v4l2-dbg -D -d <device#> to query information about each V4L2 video device, for example:

v4l2-dbg -D -d 0
Driver info:
        Driver name   : mtk-vcodec-dec
        Card type     : platform:mt8167
        Bus info      : platform:mt8167
        Driver version: 5.10.73
        Capabilities  : 0x84204000
                Video Memory-to-Memory Multiplanar
                Streaming
                Extended Pix Format
                Device Capabilities
v4l2-dbg -D -d 1
Driver info:
        Driver name   : mtk-vcodec-enc
        Card type     : platform:mt8167
        Bus info      : platform:mt8167
        Driver version: 5.10.73
        Capabilities  : 0x84204000
                Video Memory-to-Memory Multiplanar
                Streaming
                Extended Pix Format
                Device Capabilities
v4l2-dbg -D -d 2
Driver info:
        Driver name   : mtk-mdp
        Card type     : 14004000.mdp_rdma0
        Bus info      : platform:mt8173
        Driver version: 5.10.73
        Capabilities  : 0x84204000
                Video Memory-to-Memory Multiplanar
                Streaming
                Extended Pix Format
                Device Capabilities

As shown in the example above, there are 3 device nodes related to video codec:

  1. Video Decoder (/dev/video0 and /sys/devices/platform/soc/16000000.codec/video4linux/video0)

  2. Video Encoder (/dev/video1 and /sys/devices/platform/soc/17020000.codec/video4linux/video1)

  3. MDP (/dev/video2 and /sys/devices/platform/soc/14004000.mdp_rdma0/video4linux/video2)

All three devices are M2M (memory-to-memory) devices.

The userspace clients should access these devices through the V4L2 userspace API. IoT Yocto integrates the GStreamer framework, which provides V4L2 plugins for evaluation and application development.

Note

The video decoder device cannot decode into YUYV or NV12 formats directly. It can only decode the bitstream into a proprietary format. Please refer to the sections below to convert the proprietary format to the buffer format you require.

Output Format of Video Decoder

One thing worth notice is that the output buffer format of the video decoder device is a proprietary format. This can be observed with the following commands:

v4l2-ctl --list-formats -d 0
ioctl: VIDIOC_ENUM_FMT
    Type: Video Capture Multiplanar

    [0]: 'MT21' (Mediatek Compressed Format, compressed)
    [1]: 'MM21' (Mediatek block Format, compressed)

To see other information such as accepted bitstream format, please add --all parameter:

v4l2-ctl --all -d 0
Driver Info:
        Driver name      : mtk-vcodec-dec
        Card type        : platform:mt8167
        Bus info         : platform:mt8167
        Driver version   : 5.10.73
        Capabilities     : 0x84204000
                Video Memory-to-Memory Multiplanar
                Streaming
                Extended Pix Format
                Device Capabilities
        Device Caps      : 0x04204000
                Video Memory-to-Memory Multiplanar
                Streaming
                Extended Pix Format
Priority: 2
Format Video Capture Multiplanar:
        Width/Height      : 64/64
        Pixel Format      : 'MT21' (Mediatek Compressed Format)
        Field             : None
        Number of planes  : 2
        Flags             :
        Colorspace        : Rec. 709
        Transfer Function : Default
        YCbCr/HSV Encoding: Default
        Quantization      : Default
        Plane 0           :
        Bytes per Line : 64
        Size Image     : 4096
        Plane 1           :
        Bytes per Line : 64
        Size Image     : 2048
Format Video Output Multiplanar:
        Width/Height      : 64/64
        Pixel Format      : 'H264' (H.264)
        Field             : None
        Number of planes  : 1
        Flags             :
        Colorspace        : Rec. 709
        Transfer Function : Default
        YCbCr/HSV Encoding: Default
        Quantization      : Default
        Plane 0           :
        Bytes per Line : 0
        Size Image     : 1048576
Selection Video Capture: compose, Left 0, Top 0, Width 64, Height 64, Flags:
Selection Video Capture: compose_default, Left 0, Top 0, Width 64, Height 64, Flags:
Selection Video Capture: compose_bounds, Left 0, Top 0, Width 64, Height 64, Flags:

User Controls

min_number_of_capture_buffers 0x00980927 (int)    : min=0 max=32 step=1 default=1 value=0 flags=read-only, volatile

Note

Please note that the term Format Video Capture means the format of a capture device, which produces buffers. On the contrary, the term Format Video Output means the format of a video output device, which takes buffers as inputs.

Therefore, for a M2M device like the decoder,

  • the Video Output format is the input buffer format of the decoder device.

  • the Video Capture format is the output buffer format of the decoder device.

Interlaced Content Support

The video decoder mtk-vcodec-dec outputs de-interlaced(progressive) frames on interlaced formats. It performs automatically without any extra control.

The video encoder mtk-vcodec-enc does not support interlaced video encoding, only performs on input frames (no field mode).

MDP and Format Conversion

The proprietary MT21 or MM21 format cannot be decoded by software converters and must be passed to the MDP device. Therefore, a playback video pipeline always consists of video decoder hardware and MDP hardware.

The MDP device is also capable of resizing video frames and converting buffer pixel formats, the supported formats can be listed by the v4l2-ctl command:

v4l2-ctl --list-formats -d 2
ioctl: VIDIOC_ENUM_FMT
    Type: Video Capture Multiplanar

    [0]: 'NM12' (Y/CbCr 4:2:0 (N-C))
    [1]: 'NV12' (Y/CbCr 4:2:0)
    [2]: 'NM21' (Y/CrCb 4:2:0 (N-C))
    [3]: 'NV21' (Y/CrCb 4:2:0)
    [4]: 'YM21' (Planar YVU 4:2:0 (N-C))
    [5]: 'YM12' (Planar YUV 4:2:0 (N-C))
    [6]: 'YV12' (Planar YVU 4:2:0)
    [7]: 'YU12' (Planar YUV 4:2:0)
    [8]: '422P' (Planar YUV 4:2:2)
    [9]: 'NV16' (Y/CbCr 4:2:2)
    [10]: 'NM16' (Y/CbCr 4:2:2 (N-C))
    [11]: 'YUYV' (YUYV 4:2:2)
    [12]: 'UYVY' (UYVY 4:2:2)
    [13]: 'YVYU' (YVYU 4:2:2)
    [14]: 'VYUY' (VYUY 4:2:2)
    [15]: 'BA24' (32-bit ARGB 8-8-8-8)
    [16]: 'AR24' (32-bit BGRA 8-8-8-8)
    [17]: 'BX24' (32-bit XRGB 8-8-8-8)
    [18]: 'XR24' (32-bit BGRX 8-8-8-8)
    [19]: 'RGBP' (16-bit RGB 5-6-5)
    [20]: 'RGB3' (24-bit RGB 8-8-8)
    [21]: 'BGR3' (24-bit BGR 8-8-8)

The v4l2convert plug-in in GStreamer framework conveniently wraps the format conversion and resizing capabilities of MDP.

VPUD Daemon

Although the video devices are accessible through V4L2 interfaces, the kernel driver of video processing hardware delegates most of the hardware configuration logics to userspace daemons. These daemon is:

  • vpud serves the video encoder and decoder drivers.

The daemon do not provide interfaces to other userspace clients. It only works with the kernel driver. All the video processing functionalities should be accessed through the V4L2 interface on IoT Yocto.

Therefore, the video processing drivers stop working if the vpud processe is not initialized or stopped.

On IoT Yocto, the daemon is launched during the system boot process.

Video Encoder Extra-Controls

As a V4L2 video encoder, mtk-vcodec-enc also provides extra-controls to set encoder capabilities.

extra-controls of mtk-vcodec-enc

CID

Command(String)

Value

Default Value

Note

V4L2_CID_MPEG_VIDEO_BITRATE

video_bitrate

1~20000000

20000000

V4L2_CID_MPEG_VIDEO_GOP_SIZE

video_gop_size

0~65535

0

size 0 means I-VOP only

V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME

force_key_frame

0~0

0

to force set I-VOP on the next output frame

V4L2_CID_MPEG_VIDEO_HEADER_MODE

sequence_header_mode

0~1

1

0: separate mode, 1: joined-with-1st-frame mode.

V4L2_CID_MPEG_VIDEO_H264_PROFILE

h264_profile

0, 2, 4

4

0: BASELINE, 2: MAIN, 4: HIGH

V4L2_CID_MPEG_VIDEO_H264_LEVEL

h264_level

0, 2~13

11

support LEVEL_1_0~LEVEL_4_2, exclude LEVEL_1B)

Note

GStreamer is not fully support video header mode V4L2_MPEG_VIDEO_HEADER_MODE_SEPARATE.

For example, to compress a H.264 main profile and level 4.1 video bitstream with 512kbps bitrate:

gst-launch-1.0 -v videotestsrc num-buffers=300 ! "video/x-raw,format=NV12, width=720, height=480, framerate=30/1"  ! v4l2h264enc extra-controls="cid,video_gop_size=30,video_bitrate=512000,sequence_header_mode=1" ! "video/x-h264,level=(string)4.1,profile=main" ! h264parse ! mp4mux ! filesink location=/tmp/test-h264.mp4
...
Execution ended after 0:00:01.554987154
Setting pipeline to NULL ...
Freeing pipeline ...

Note

To modify profile and level, please set it via gst-caps. If set by extra-controls directly, The profile & level will be overridden during gst caps negotiation.

Performance Measurement

Software vs. Hardware Decoder and Converter

On IoT Yocto, it provides VCODEC and MDP, which are hardware components to accelerate the video pipeline. You can still use software components to process the video, but the performance may be terrible due to the CPU performance. In this section, there are software and hardware samples for you to compare the framerate influence. The scenario is to decode a 720P/30FPS video, convert it to a 1080P/30FPS video, and then show it on the screen. fpsdisplaysink is used to calculate the framerate.

To use the software method:

gst-launch-1.0 -v filesrc location=<your-video-path> ! parsebin ! avdec_h264 ! \
videoscale ! video/x-raw,width=1920,height=1080 ! fpsdisplaysink video-sink=waylandsink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstTextOverlay:fps-display-text-overlay: text = rendered: 181, dropped: 0, current: 18.74, average: 18.92
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 181, dropped: 0, current: 18.74, average: 18.92

To use the hardware method:

gst-launch-1.0 -v filesrc location=<your-video-path> ! parsebin ! v4l2h264dec ! \
v4l2convert output-io-mode=5 ! video/x-raw,width=1920,height=1080 ! fpsdisplaysink video-sink=waylandsink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstTextOverlay:fps-display-text-overlay: text = rendered: 268, dropped: 1, current: 27.95, average: 27.80
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 268, dropped: 1, current: 27.95, average: 27.80

The average framerate of the software method is about 18.92 FPS while the average framerate of the hardware method is 27.80 FPS. The heavier loading it takes, the more difference it creates.

Note

When using fpsdisplaysink to check performance, please add ‘text-overlay=false’ to prevent drawing FPS information on the display overlay. It might cost a lot of CPU computing power.

GStreamer Pipeline for Performance Test

Note

The following test results are based on Genio 1200-EVK.

For the decoder performance test, fpsdisplaysink is used to show FPS information, and waylandsink is assigned as the video-sink to check the quality of the screen.

An example of the performance test on a 4K60fps H264 video playback:

gst-launch-1.0 -v filesrc location=H264_3840x2160_60fps.mp4 ! parsebin ! queue ! v4l2h264dec ! queue ! \
v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! queue ! fpsdisplaysink video-sink=waylandsink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstWaylandSink:waylandsink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 63.65, average: 63.65
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 62, dropped: 0, current: 58.81, average: 61.21
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 93, dropped: 0, current: 61.20, average: 61.21

Note

The GStreamer element queue is added to the pipeline to remove the buffer dependency between elements.

An example of the performance test on a 4K60fps H265 video playback:

gst-launch-1.0 -v filesrc location=H265_3840x2160_60fps.mp4 ! parsebin ! queue ! v4l2h265dec ! queue ! \
v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! queue ! fpsdisplaysink video-sink=waylandsink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstWaylandSink:waylandsink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 63.70, average: 63.70
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 62, dropped: 0, current: 59.99, average: 61.85
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 93, dropped: 0, current: 60.00, average: 61.22

An example of the performance test on a FHD60fps MPEG4 video playback:

gst-launch-1.0 -v filesrc location=MPEG4_1920x1080_60fps.mp4 ! parsebin ! queue ! v4l2mpeg4dec ! queue ! \
v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! queue ! fpsdisplaysink video-sink=waylandsink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstWaylandSink:waylandsink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 63.75, average: 63.75
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 63, dropped: 0, current: 60.00, average: 61.85
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 94, dropped: 0, current: 59.99, average: 61.23

For the encoder performance test, fpsdisplaysink is used to show FPS information, and fakesink is assigned as the video-sink to remove the overhead of the file writer. In the following test, we simply use the decoded video frames as the encoder input sources.

Note

There is a hardware limitation that the input frame buffer MUST align to 16x16 (the buffer width is the multiple of 16 and the buffer height is the multiple of 16)

An example of the performance test on a 4K60fps H264 video encoding:

gst-launch-1.0 -v filesrc location=H264_3840x2160_60fps.mp4 ! parsebin ! queue ! v4l2h264dec ! queue ! \
v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! queue ! v4l2h264enc output-io-mode=dmabuf-import ! queue ! \
fpsdisplaysink video-sink=fakesink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 63.93, average: 63.93
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 63, dropped: 0, current: 60.00, average: 61.93
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 93, dropped: 0, current: 60.00, average: 61.30

Note

The buffer type output-io-mode=dmabuf-import was assigned to v4l2h264enc to prevent the buffer copy of the input source.

An example of the performance test on a 4K60fps H265 video encoding:

gst-launch-1.0 -v filesrc location=H264_3840x2160_60fps.mp4 ! parsebin ! queue ! v4l2h264dec ! queue ! \
v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! queue ! v4l2h265enc output-io-mode=dmabuf-import ! queue ! \
fpsdisplaysink video-sink=fakesink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 63.88, average: 63.88
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 63, dropped: 0, current: 60.04, average: 61.93
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 93, dropped: 0, current: 60.00, average: 61.29

An example of the performance test on a FHD120fps H264 video encoding:

gst-launch-1.0 -v filesrc location=H264_1920x1080_120fps.mp4 ! parsebin ! queue ! v4l2h264dec ! queue ! \
v4l2convert output-io-mode=dmabuf-import capture-io-mode=dmabuf ! video/x-raw,width=1920,height=1088 ! queue ! \
v4l2h264enc output-io-mode=dmabuf-import ! queue ! fpsdisplaysink video-sink=fakesink text-overlay=false
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 62, dropped: 0, current: 123.86, average: 123.86
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 123, dropped: 0, current: 120.06, average: 121.94
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 183, dropped: 0, current: 119.99, average: 121.30

Note

In the test, the FHD video frames outputted by the v4l2h264dec was converted to 1920x1088 for v4l2h264enc due to the encoder hardware limitation (16x16 alignment). It can be removed if the frame buffer size is already 16x16 alignment.

FAQ

Is there any low level library that can be used to control video HW encoder & decoder?

We only support video encode/decode via V4L2 framework.

You can try to include GStreamer library in your application to control V4L2 framework. Please refer to the GStreamer hello-world example or simplely use gst_parse_launch to parse the gst-launch commands.

Why the profile and level settings set to the H264 encoder is not exactly matched in the output file?

For the profile, the H264 encoder will decide how many features (such as CABAC, and 4x4 transform) will be used on the current input video. The profile setting will be changed according to the applied features.

For the level, the H264 encoder will output bitstream with the correct level according to the resolution and level defined in ISO/IEC 14496-10 – MPEG-4 Part 10, Advanced Video Coding. e.g. level 5.1 to support 4K video.

Please refer to Advanced Video Coding for the profile and level definition.

Why the bitrate setting set to the encoder is not exactly matched in the output file?

Higher motion / more details content requires a higher bitrate to achieve the same perceived quality video stream. For example, a sporting event or concert with high motion and many moving cameras will typically require a significantly higher bitrate at the same resolution to have the same perceived quality.

Higher resolutions require a higher bitrate to achieve the same perceived quality video stream.

It is important that appropriately adjust your bitrate to the resolution you are using. Using too high or low of a bitrate can lead to poor image quality and might not meet the bitrate target.

e..g For a 4K video, the suggested bitrate setting is 10~40 Mbps on YouTube