Genio 350-EVK

MT8365 System On Chip

Hardware

MT8365

CPU

4x CA53 2.0GHz

GPU

ARM Mali-G52

AI

APU (VPU)

Please refer to the MT8365 (Genio 350) to find detailed specifications.

APU

The APU includes a multi-core processor combined with intelligent control logic. It is 2X more power efficient than a GPU and generates class-leading edge-AI processing performance of up to 0.3T.

Overview

On Genio 350-EVK, we provide tensorflow lite with hardware acceleration to develop and deploy a wide range of machine learning. The following figure illustrates the machine learning software stack:

../../../_images/sw_rity_ml-guide_i350_sw_stack.svg

Machine learning software stack on Genio 350-EVK.

By using TensorFlow Lite Delegates, you can enable hardware acceleration of TensorFlow Lite models by leveraging on-device accelerators such as the GPU and Digital Signal Processor (DSP). AIoT Yocto already integrated the following 3 delegates

  • GPU delegate: The GPU delegate uses Open GL ES compute shader on the device to inference TensorFlow Lite model.

  • Arm NN delegate: Arm NN is a set of open-source software that enables machine learning workloads on Arm hardware devices. It provides a bridge between existing neural network frameworks and Cortex-A CPUs, Arm Mali GPUs.

  • NNAPI delegate: It provides acceleration for TensorFlow Lite models on Android devices with supported hardware accelerators. But now Google has ported NNAPI from Android to their ChromeOS (NNAPI on ChromeOS). AIoT Yocto adapted it on AIoT Yocto.

Note

  • Currently, NNAPI on Linux supports only one HAL that needs to be built at compile time.

  • The HAL is a dynamically shared library named libvendor-nn-hal.so.

  • AIoT Yocto default uses XtensaANN HAL which is the HAL to use the VPU from Cadence. You can find it in $BUILD_DIR/conf/local.conf

    ...
    PREFERRED_PROVIDER_virtual/libvendor-nn-hal:i350 = "xtensa-ann-bin"
    

Note

Software information, cmd operations, and test results presented in this chapter are based on the latest version of AIoT Yocto (v22.1), Genio 350-EVK If you are using an older version of AIoT Yocto, please find information in Release History


Tensorflow Lite and Delegates

AIoT Yocto integrated the tensorflow lite and Arm NN delegate to provide neural network acceleration. The software versions are as follows:

Component

Version

Support Operations

TFLite

2.9.0

TFLite Ops

Arm NN

22.02

Arm NN TFLite Delegate Supported Operators

NNAPI

1.3

Android Neural Networks

Supported Operations

Supported Operations

TFLite 2.9.0

ARMNN 22.02

NNAPI 1.3

Xtensa-ANN 1.3.1

abs

ABS

ANEURALNETWORKS_ABS

add

ADD

ANEURALNETWORKS_ADD

ANEURALNETWORKS_ADD

add_n

arg_max

ARGMAX

ANEURALNETWORKS_ARGMAX

arg_min

ARGMIN

ANEURALNETWORKS_ARGMIN

assign_variable

average_pool_2d

AVERAGE_POOL_2D

ANEURALNETWORKS_AVERAGE_POOL_2D

ANEURALNETWORKS_AVERAGE_POOL_2D

basic_lstm

batch_matmul

batch_to_space_nd

BATCH_TO_SPACE_ND

ANEURALNETWORKS_BATCH_TO_SPACE_ND

bidirectional_sequence_lstm

broadcast_args

broadcast_to

bucketize

call_once

cast

CAST

ANEURALNETWORKS_CAST

ANEURALNETWORKS_CAST

ceil

complex_abs

concatenation

CONCATENATION

ANEURALNETWORKS_CONCATENATION

ANEURALNETWORKS_CONCATENATION

conv_2d

CONV_2D

ANEURALNETWORKS_CONV_2D

ANEURALNETWORKS_CONV_2D

conv_3d

CONV_3D

conv_3d_transpose

cos

cumsum

custom

custom_tf

densify

depth_to_space

DEPTH_TO_SPACE

ANEURALNETWORKS_DEPTH_TO_SPACE

ANEURALNETWORKS_DEPTH_TO_SPACE

depthwise_conv_2d

DEPTHWISE_CONV_2D

ANEURALNETWORKS_DEPTHWISE_CONV_2D

ANEURALNETWORKS_DEPTHWISE_CONV_2D

dequantize

DEQUANTIZE

ANEURALNETWORKS_DEQUANTIZE

div

DIV

ANEURALNETWORKS_DIV

ANEURALNETWORKS_DIV

dynamic_update_slice

elu

ELU

ANEURALNETWORKS_ELU

embedding_lookup

equal

EQUAL

ANEURALNETWORKS_EQUAL

exp

EXP

ANEURALNETWORKS_EXP

expand_dims

ANEURALNETWORKS_EXPAND_DIMS

external_const

fake_quant

fill

ANEURALNETWORKS_FILL

floor

FLOOR

ANEURALNETWORKS_FLOOR

floor_div

floor_mod

fully_connected

FULLY_CONNECTED

ANEURALNETWORKS_FULLY_CONNECTED

ANEURALNETWORKS_FULLY_CONNECTED

gather

GATHER

ANEURALNETWORKS_GATHER

gather_nd

gelu

greater

GREATER

ANEURALNETWORKS_GREATER

greater_equal

GREATER_OR_EQUAL

ANEURALNETWORKS_GREATER_EQUAL

hard_swish

HARD_SWISH

ANEURALNETWORKS_HARD_SWISH

hashtable

hashtable_find

hashtable_import

hashtable_size

if

imag

l2_normalization

L2_NORMALIZATION

ANEURALNETWORKS_L2_NORMALIZATION

ANEURALNETWORKS_L2_NORMALIZATION

leaky_relu

less

LESS

ANEURALNETWORKS_LESS

less_equal

LESS_OR_EQUAL

ANEURALNETWORKS_LESS_EQUAL

local_response_normalization

LOCAL_RESPONSE_NORMALIZATION

ANEURALNETWORKS_LOCAL_RESPONSE_NORMALIZATION

log

ANEURALNETWORKS_LOG

log_softmax

LOG_SOFTMAX

ANEURALNETWORKS_LOG_SOFTMAX

ANEURALNETWORKS_LOG_SOFTMAX

logical_and

LOGICAL_AND

ANEURALNETWORKS_LOGICAL_AND

logical_not

LOGICAL_NOT

ANEURALNETWORKS_LOGICAL_NOT

logical_or

LOGICAL_OR

ANEURALNETWORKS_LOGICAL_OR

logistic

LOGISTIC

ANEURALNETWORKS_LOGISTIC

ANEURALNETWORKS_LOGISTIC

lstm

LSTM

ANEURALNETWORKS_LSTM

matrix_diag

matrix_set_diag

max_pool_2d

MAX_POOL_2D

ANEURALNETWORKS_MAX_POOL_2D

ANEURALNETWORKS_MAX_POOL_2D

maximum

MAXIMUM

ANEURALNETWORKS_MAXIMUM

ANEURALNETWORKS_MAXIMUM

mean

MEAN

ANEURALNETWORKS_MEAN

minimum

MINIMUM

ANEURALNETWORKS_MINIMUM

ANEURALNETWORKS_MINIMUM

mirror_pad

MIRROR_PAD

mul

MUL

ANEURALNETWORKS_MUL

ANEURALNETWORKS_MUL

multinomial

neg

NEG

ANEURALNETWORKS_NEG

no_value

non_max_suppression_v4

non_max_suppression_v5

not_equal

NOT_EQUAL

ANEURALNETWORKS_NOT_EQUAL

NumericVerify

one_hot

pack

PACK

pad

PAD

ANEURALNETWORKS_PAD

padv2

ANEURALNETWORKS_PAD_V2

poly_call

pow

ANEURALNETWORKS_POW

prelu

PRELU

ANEURALNETWORKS_PRELU

ANEURALNETWORKS_PRELU

pseudo_const

pseudo_qconst

pseudo_sparse_const

pseudo_sparse_qconst

quantize

QUANTIZE

ANEURALNETWORKS_QUANTIZE

random_standard_normal

random_uniform

range

rank

RANK

ANEURALNETWORKS_RANK

read_variable

real

reduce_all

reduce_any

ANEURALNETWORKS_REDUCE_ANY

reduce_max

REDUCE_MAX

ANEURALNETWORKS_REDUCE_MAX

reduce_min

REDUCE_MIN

ANEURALNETWORKS_REDUCE_MIN

reduce_prod

ANEURALNETWORKS_REDUCE_PROD

relu

RELU

ANEURALNETWORKS_RELU

ANEURALNETWORKS_RELU

relu6

RELU6

ANEURALNETWORKS_RELU6

ANEURALNETWORKS_RELU6

relu_n1_to_1

ANEURALNETWORKS_RELU1

ANEURALNETWORKS_RELU1

reshape

RESHAPE

ANEURALNETWORKS_RESHAPE

ANEURALNETWORKS_RESHAPE

resize_bilinear

RESIZE_BILINEAR

ANEURALNETWORKS_RESIZE_BILINEAR

resize_nearest_neighbor

RESIZE_NEAREST_NEIGHBOR

ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR

ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR

reverse_sequence

reverse_v2

rfft2d

round

rsqrt

RSQRT

ANEURALNETWORKS_RSQRT

ANEURALNETWORKS_RSQRT

scatter_nd

segment_sum

select

ANEURALNETWORKS_SELECT

ANEURALNETWORKS_SELECT

select_v2

shape

SHAPE

sin

ANEURALNETWORKS_SIN

slice

ANEURALNETWORKS_SLICE

softmax

SOFTMAX

ANEURALNETWORKS_SOFTMAX

ANEURALNETWORKS_SOFTMAX

space_to_batch_nd

SPACE_TO_BATCH_ND

ANEURALNETWORKS_SPACE_TO_BATCH_ND

space_to_depth

SPACE_TO_DEPTH

ANEURALNETWORKS_SPACE_TO_DEPTH

ANEURALNETWORKS_SPACE_TO_DEPTH

sparse_to_dense

split

SPLIT

ANEURALNETWORKS_SPLIT

split_v

SPLIT_V

sqrt

SQRT

ANEURALNETWORKS_SQRT

ANEURALNETWORKS_SQRT

square

squared_difference

squeeze

ANEURALNETWORKS_SQUEEZE

strided_slice

STRIDED_SLICE

ANEURALNETWORKS_STRIDED_SLICE

sub

SUB

ANEURALNETWORKS_SUB

sum

SUM

svdf

ANEURALNETWORKS_SVDF

tanh

TANH

ANEURALNETWORKS_TANH

tile

ANEURALNETWORKS_TILE

topk_v2

ANEURALNETWORKS_TOPK_V2

ANEURALNETWORKS_TOPK_V2

transpose

TRANSPOSE

ANEURALNETWORKS_TRANSPOSE

ANEURALNETWORKS_TRANSPOSE

transpose_conv

TRANSPOSE_CONV

ANEURALNETWORKS_TRANSPOSE_CONV_2D

ANEURALNETWORKS_TRANSPOSE_CONV_2D

unidirectional_sequence_lstm

UNIDIRECTIONAL_SEQUENCE_LSTM

ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_LSTM

unidirectional_sequence_rnn

ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_RNN

unique

unpack

UNPACK

var_handle

where

while

ANEURALNETWORKS_WHILE

yield

zeros_like

L2_POOL_2D

ANEURALNETWORKS_L2_POOL_2D

ANEURALNETWORKS_L2_POOL_2D

ANEURALNETWORKS_HASHTABLE_LOOKUP

ANEURALNETWORKS_HASHTABLE_LOOKUP

ANEURALNETWORKS_EMBEDDING_LOOKUP

ANEURALNETWORKS_EMBEDDING_LOOKUP

ANEURALNETWORKS_LSH_PROJECTION

ANEURALNETWORKS_RNN

ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM

ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM

ANEURALNETWORKS_BOX_WITH_NMS_LIMIT

ANEURALNETWORKS_BOX_WITH_NMS_LIMIT

ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_LSTM

ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_RNN

ANEURALNETWORKS_CHANNEL_SHUFFLE

ANEURALNETWORKS_CHANNEL_SHUFFLE

ANEURALNETWORKS_DETECTION_POSTPROCESSING

ANEURALNETWORKS_DETECTION_POSTPROCESSING

ANEURALNETWORKS_GENERATE_PROPOSALS

ANEURALNETWORKS_GENERATE_PROPOSALS

ANEURALNETWORKS_GROUPED_CONV_2D

ANEURALNETWORKS_GROUPED_CONV_2D

ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT

ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT

ANEURALNETWORKS_INSTANCE_NORMALIZATION

ANEURALNETWORKS_QUANTIZED_16BIT_LSTM

ANEURALNETWORKS_QUANTIZED_16BIT_LSTM

ANEURALNETWORKS_QUANTIZED_LSTM

ANEURALNETWORKS_QUANTIZED_LSTM

ANEURALNETWORKS_RANDOM_MULTINOMIAL

ANEURALNETWORKS_REDUCE_ALL

ANEURALNETWORKS_REDUCE_SUM

ANEURALNETWORKS_ROI_ALIGN

ANEURALNETWORKS_ROI_POOLING

ANEURALNETWORKS_IF


Demo

A python demo application for image recognition is built into the image that can be found in the /usr/share/label_image directory. It is adapted from the upstream label_image.py

cd /usr/share/label_image
ls -l

-rw-r--r-- 1 root root   940650 Mar  9  2018 grace_hopper.bmp
-rw-r--r-- 1 root root    61306 Mar  9  2018 grace_hopper.jpg
-rw-r--r-- 1 root root    10479 Mar  9  2018 imagenet_slim_labels.txt
-rw-r--r-- 1 root root 95746802 Mar  9  2018 inception_v3_2016_08_28_frozen.pb
-rw-r--r-- 1 root root     4388 Mar  9  2018 label_image.py
-rw-r--r-- 1 root root    10484 Mar  9  2018 labels_mobilenet_quant_v1_224.txt
-rw-r--r-- 1 root root  4276352 Mar  9  2018 mobilenet_v1_1.0_224_quant.tflite

Basic commands for running the demo with different delegates are as follows.

  • Execute on CPU

cd /usr/share/label_image
python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite
  • Execute on GPU, with GPU delegate

cd /usr/share/label_image
python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/gpu_external_delegate.so
  • Execute on GPU, with Arm NN delegate

cd /usr/share/label_image
python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/libarmnnDelegate.so.25 -o "backends:GpuAcc,CpuAcc"
  • Execute on VPU, with NNAPI delegate

cd /usr/share/label_image
python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/nnapi_external_delegate.so

Benchmark Tool

benchmark_model is provided in Tenforflow Performance Measurement for performance evaluation.

Basic commands for running the benchmark tool with CPU and different delegates are as follows.

  • Execute on CPU (4 threads)

benchmark_model --graph=/usr/share/label_image/mobilenet_v1_1.0_224_quant.tflite --num_threads=4 --use_xnnpack=0 --num_runs=10
  • Execute on GPU, with GPU delegate

benchmark_model --graph=/usr/share/label_image/mobilenet_v1_1.0_224_quant.tflite --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --use_xnnpack=0 --num_runs=10
  • Execute on GPU, with Arm NN delegate

benchmark_model --graph=/usr/share/label_image/mobilenet_v1_1.0_224_quant.tflite --external_delegate_path=/usr/lib64/libarmnnDelegate.so.25 --external_delegate_options="backends:GpuAcc,CpuAcc" --use_xnnpack=0 --num_runs=10
  • Execute on VPU, with NNAPI delegate

benchmark_model --graph=/usr/share/label_image/mobilenet_v1_1.0_224_quant.tflite --use_nnapi=1 --disable_nnapi_cpu=1 --use_xnnpack=0 --num_runs=10

Benchmark Result

The following table are the benchmark results under performance mode

Average inference time(ms)

Run model (.tflite) 10 times

CPU (Thread:4)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

NNAPI: VPU

inception_v3

702.909

779.619

483.511

393.543

Not be executed by VPU

inception_v3_quant

345.205

780.525

282.458

279.392

99.231

mobilenet_v2_1.0.224

54.486

59.661

53.64

51.157

Not be executed by VPU

mobilenet_v2_1.0.224_quant

30.147

60.968

35.587

32.033

21.118

ResNet50V2_224_1.0

478.129

488.363

358.839

260.378

Not be executed by VPU

ResNet50V2_224_1.0_quant

258.675

505.131

220.283

214.076

158.044

ssd_mobilenet_v1_coco

148.024

211.375

162.886

125.939

Not be executed by VPU

ssd_mobilenet_v1_coco_quantized

74.07

213.846

85.086

73.742

31.515


Performance Mode

Force CPU, GPU, and APU(VPU) to run at maximum frequency.

  • CPU at maximum frequency

    There are 4 CPU cores on Genio 350-EVK. Please run following commands for each CPU core (/sys/devices/system/cpu/cpuX/).

    # change CPUFreq policy to userspace
    echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
    
    # query available frequencies
    cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
    850000 918000 987000 1056000 1125000 1216000 1308000 1400000 1466000 1533000 1633000 1700000 1767000 1834000 1917000 2001000
    
    # set maximum frequency
    echo 2001000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
    
    # check current frequency
    cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
    2001000
    
  • GPU at maximum frequency

    Please refer to Adjust GPU Frequency to fix GPU to run at maximum frequency.

    Or you could just set performance for GPU governor and make the GPU statically to the highest frequency.

    echo performance > /sys/devices/platform/soc/13040000.mali/devfreq/13040000.mali/governor
    
  • APU at maximum frequency

    Currently, VPU is always running at maximum frequency.


Release History

Different versions of AIoT Yocto integrate different versions of software packages. The location of demo program and test models on board are also not always the same. If you are using an older version of AIoT Yocto, you can find software information, cmd operations and test results for older versions from following sections

AIoT Yocto v22.0.1

AIoT Yocto integrated the tensorflow lite and Arm NN delegate to provide neural network acceleration. The software versions are as follows:

Component

Version

TFLite

2.6.1

Arm NN

21.11

NNAPI

1.3

Xtensa-ANN

1.3.1

Supported Operations

Supported Operations

TFLite 2.6.1

Armnn 21.11

NNAPI 1.3

Xtensa-ANN 1.3.1

abs

ABS

ANEURALNETWORKS_ABS

add

ADD

ANEURALNETWORKS_ADD

ANEURALNETWORKS_ADD

add_n

arg_max

ARGMAX

ANEURALNETWORKS_ARGMAX

arg_min

ARGMIN

ANEURALNETWORKS_ARGMIN

assign_variable

average_pool_2d

AVERAGE_POOL_2D

ANEURALNETWORKS_AVERAGE_POOL_2D

ANEURALNETWORKS_AVERAGE_POOL_2D

basic_lstm

batch_matmul

batch_to_space_nd

BATCH_TO_SPACE_ND

ANEURALNETWORKS_BATCH_TO_SPACE_ND

bidirectional_sequence_lstm

broadcast_to

call_once

cast

CAST

ANEURALNETWORKS_CAST

ANEURALNETWORKS_CAST

ceil

complex_abs

concatenation

CONCATENATION

ANEURALNETWORKS_CONCATENATION

ANEURALNETWORKS_CONCATENATION

conv_2d

CONV_2D

ANEURALNETWORKS_CONV_2D

ANEURALNETWORKS_CONV_2D

conv_3d

CONV_3D

conv_3d_transpose

cos

cumsum

custom

custom_tf

densify

depth_to_space

DEPTH_TO_SPACE

ANEURALNETWORKS_DEPTH_TO_SPACE

ANEURALNETWORKS_DEPTH_TO_SPACE

depthwise_conv_2d

DEPTHWISE_CONV_2D

ANEURALNETWORKS_DEPTHWISE_CONV_2D

ANEURALNETWORKS_DEPTHWISE_CONV_2D

dequantize

DEQUANTIZE

ANEURALNETWORKS_DEQUANTIZE

div

DIV

ANEURALNETWORKS_DIV

ANEURALNETWORKS_DIV

elu

ELU

ANEURALNETWORKS_ELU

embedding_lookup

equal

EQUAL

ANEURALNETWORKS_EQUAL

exp

EXP

ANEURALNETWORKS_EXP

expand_dims

ANEURALNETWORKS_EXPAND_DIMS

external_const

fake_quant

fill

ANEURALNETWORKS_FILL

floor

FLOOR

ANEURALNETWORKS_FLOOR

floor_div

floor_mod

fully_connected

FULLY_CONNECTED

ANEURALNETWORKS_FULLY_CONNECTED

ANEURALNETWORKS_FULLY_CONNECTED

gather

GATHER

ANEURALNETWORKS_GATHER

gather_nd

greater

GREATER

ANEURALNETWORKS_GREATER

greater_equal

GREATER_OR_EQUAL

ANEURALNETWORKS_GREATER_EQUAL

hard_swish

HARD_SWISH

ANEURALNETWORKS_HARD_SWISH

hashtable

hashtable_find

hashtable_import

hashtable_size

if

imag

l2_normalization

L2_NORMALIZATION

ANEURALNETWORKS_L2_NORMALIZATION

ANEURALNETWORKS_L2_NORMALIZATION

leaky_relu

less

LESS

ANEURALNETWORKS_LESS

less_equal

LESS_OR_EQUAL

ANEURALNETWORKS_LESS_EQUAL

local_response_normalization

LOCAL_RESPONSE_NORMALIZATION

ANEURALNETWORKS_LOCAL_RESPONSE_NORMALIZATION

log

ANEURALNETWORKS_LOG

log_softmax

LOG_SOFTMAX

ANEURALNETWORKS_LOG_SOFTMAX

ANEURALNETWORKS_LOG_SOFTMAX

logical_and

LOGICAL_AND

ANEURALNETWORKS_LOGICAL_AND

logical_not

LOGICAL_NOT

ANEURALNETWORKS_LOGICAL_NOT

logical_or

LOGICAL_OR

ANEURALNETWORKS_LOGICAL_OR

logistic

LOGISTIC

ANEURALNETWORKS_LOGISTIC

ANEURALNETWORKS_LOGISTIC

lstm

LSTM

ANEURALNETWORKS_LSTM

matrix_diag

matrix_set_diag

max_pool_2d

MAX_POOL_2D

ANEURALNETWORKS_MAX_POOL_2D

ANEURALNETWORKS_MAX_POOL_2D

maximum

MAXIMUM

ANEURALNETWORKS_MAXIMUM

ANEURALNETWORKS_MAXIMUM

mean

MEAN

ANEURALNETWORKS_MEAN

minimum

MINIMUM

ANEURALNETWORKS_MINIMUM

ANEURALNETWORKS_MINIMUM

mirror_pad

MIRROR_PAD

mul

MUL

ANEURALNETWORKS_MUL

ANEURALNETWORKS_MUL

neg

NEG

ANEURALNETWORKS_NEG

non_max_suppression_v4

non_max_suppression_v5

not_equal

NOT_EQUAL

ANEURALNETWORKS_NOT_EQUAL

NumericVerify

one_hot

pack

PACK

pad

PAD

ANEURALNETWORKS_PAD

padv2

ANEURALNETWORKS_PAD_V2

pow

ANEURALNETWORKS_POW

prelu

PRELU

ANEURALNETWORKS_PRELU

ANEURALNETWORKS_PRELU

pseudo_const

pseudo_qconst

pseudo_sparse_const

pseudo_sparse_qconst

quantize

QUANTIZE

ANEURALNETWORKS_QUANTIZE

range

rank

RANK

ANEURALNETWORKS_RANK

read_variable

real

reduce_all

reduce_any

ANEURALNETWORKS_REDUCE_ANY

reduce_max

REDUCE_MAX

ANEURALNETWORKS_REDUCE_MAX

reduce_min

REDUCE_MIN

ANEURALNETWORKS_REDUCE_MIN

reduce_prod

ANEURALNETWORKS_REDUCE_PROD

relu

RELU

ANEURALNETWORKS_RELU

ANEURALNETWORKS_RELU

relu6

RELU6

ANEURALNETWORKS_RELU6

ANEURALNETWORKS_RELU6

relu_n1_to_1

ANEURALNETWORKS_RELU1

ANEURALNETWORKS_RELU1

reshape

RESHAPE

ANEURALNETWORKS_RESHAPE

ANEURALNETWORKS_RESHAPE

resize_bilinear

RESIZE_BILINEAR

ANEURALNETWORKS_RESIZE_BILINEAR

resize_nearest_neighbor

RESIZE_NEAREST_NEIGHBOR

ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR

ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR

reverse_sequence

reverse_v2

rfft2d

round

rsqrt

RSQRT

ANEURALNETWORKS_RSQRT

ANEURALNETWORKS_RSQRT

scatter_nd

segment_sum

select

ANEURALNETWORKS_SELECT

ANEURALNETWORKS_SELECT

select_v2

shape

SHAPE

sin

ANEURALNETWORKS_SIN

slice

ANEURALNETWORKS_SLICE

softmax

SOFTMAX

ANEURALNETWORKS_SOFTMAX

ANEURALNETWORKS_SOFTMAX

space_to_batch_nd

SPACE_TO_BATCH_ND

ANEURALNETWORKS_SPACE_TO_BATCH_ND

space_to_depth

SPACE_TO_DEPTH

ANEURALNETWORKS_SPACE_TO_DEPTH

ANEURALNETWORKS_SPACE_TO_DEPTH

sparse_to_dense

split

SPLIT

ANEURALNETWORKS_SPLIT

split_v

SPLIT_V

sqrt

SQRT

ANEURALNETWORKS_SQRT

ANEURALNETWORKS_SQRT

square

squared_difference

squeeze

ANEURALNETWORKS_SQUEEZE

strided_slice

STRIDED_SLICE

ANEURALNETWORKS_STRIDED_SLICE

sub

SUB

ANEURALNETWORKS_SUB

sum

SUM

svdf

ANEURALNETWORKS_SVDF

tanh

TANH

ANEURALNETWORKS_TANH

tile

ANEURALNETWORKS_TILE

topk_v2

ANEURALNETWORKS_TOPK_V2

ANEURALNETWORKS_TOPK_V2

transpose

TRANSPOSE

ANEURALNETWORKS_TRANSPOSE

ANEURALNETWORKS_TRANSPOSE

transpose_conv

TRANSPOSE_CONV

ANEURALNETWORKS_TRANSPOSE_CONV_2D

ANEURALNETWORKS_TRANSPOSE_CONV_2D

unidirectional_sequence_lstm

UNIDIRECTIONAL_SEQUENCE_LSTM

ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_LSTM

unidirectional_sequence_rnn

ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_RNN

unique

unpack

UNPACK

var_handle

where

while

ANEURALNETWORKS_WHILE

yield

zeros_like

L2_POOL_2D

ANEURALNETWORKS_L2_POOL_2D

ANEURALNETWORKS_L2_POOL_2D

ANEURALNETWORKS_HASHTABLE_LOOKUP

ANEURALNETWORKS_HASHTABLE_LOOKUP

ANEURALNETWORKS_EMBEDDING_LOOKUP

ANEURALNETWORKS_EMBEDDING_LOOKUP

ANEURALNETWORKS_LSH_PROJECTION

ANEURALNETWORKS_RNN

ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM

ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM

ANEURALNETWORKS_BOX_WITH_NMS_LIMIT

ANEURALNETWORKS_BOX_WITH_NMS_LIMIT

ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_LSTM

ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_RNN

ANEURALNETWORKS_CHANNEL_SHUFFLE

ANEURALNETWORKS_CHANNEL_SHUFFLE

ANEURALNETWORKS_DETECTION_POSTPROCESSING

ANEURALNETWORKS_DETECTION_POSTPROCESSING

ANEURALNETWORKS_GENERATE_PROPOSALS

ANEURALNETWORKS_GENERATE_PROPOSALS

ANEURALNETWORKS_GROUPED_CONV_2D

ANEURALNETWORKS_GROUPED_CONV_2D

ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT

ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT

ANEURALNETWORKS_INSTANCE_NORMALIZATION

ANEURALNETWORKS_QUANTIZED_16BIT_LSTM

ANEURALNETWORKS_QUANTIZED_16BIT_LSTM

ANEURALNETWORKS_QUANTIZED_LSTM

ANEURALNETWORKS_QUANTIZED_LSTM

ANEURALNETWORKS_RANDOM_MULTINOMIAL

ANEURALNETWORKS_REDUCE_ALL

ANEURALNETWORKS_REDUCE_SUM

ANEURALNETWORKS_ROI_ALIGN

ANEURALNETWORKS_ROI_POOLING

ANEURALNETWORKS_IF


Demo

Please refer to cmd to run the demo

Benchmark Tool

Please refer to cmd to run the demo

Benchmark Result

The following table are the benchmark results under performance mode

Average inference time(ms)

Run model (.tflite) 10 times

CPU (Thread:4)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

NNAPI: VPU

inception_v3

702.123

750.467

480.038

395.329

Not be executed by VPU

inception_v3_quant

345.236

753.026

282.746

278.498

99.176

mobilenet_v2_1.0.224

54.47

59.631

51.321

51.869

Not be executed by VPU

mobilenet_v2_1.0.224_quant

30.245

60.949

35.224

32.15

21.166

ResNet50V2_224_1.0

478.925

487.854

352.779

256.576

Not be executed by VPU

ResNet50V2_224_1.0_quant

259.338

504.392

223.008

215.457

167.004

ssd_mobilenet_v1_coco

149.983

212.282

157.512

126.237

Not be executed by VPU

ssd_mobilenet_v1_coco_quantized

73.992

214.196

84.716

74.474

31.549


AIoT Yocto v21.3

AIoT Yocto integrated the tensorflow lite and Arm NN delegate to provide neural network acceleration. The software versions are as follows:

Component

Version

TFLite

2.4.0

Arm NN

21.05

NNAPI

1.3

Xtensa-ANN

1.3.1

Supported Operations

Supported Operations

TFLite 2.4.0

Armnn 21.05

NNAPI 1.3

Xtensa-ANN 1.3.1

abs

ABS

ANEURALNETWORKS_ABS

add

ADD

ANEURALNETWORKS_ADD

ANEURALNETWORKS_ADD

add_n

arg_max

ARGMAX

ANEURALNETWORKS_ARGMAX

arg_min

ARGMIN

ANEURALNETWORKS_ARGMIN

average_pool_2d

AVERAGE_POOL_2D

ANEURALNETWORKS_AVERAGE_POOL_2D

ANEURALNETWORKS_AVERAGE_POOL_2D

basic_lstm

batch_to_space_nd

BATCH_TO_SPACE_ND

ANEURALNETWORKS_BATCH_TO_SPACE_ND

cast

ANEURALNETWORKS_CAST

ANEURALNETWORKS_CAST

ANEURALNETWORKS_CAST

ceil

concatenation

CONCATENATION

ANEURALNETWORKS_CONCATENATION

ANEURALNETWORKS_CONCATENATION

conv_2d

CONV_2D

ANEURALNETWORKS_CONV_2D

ANEURALNETWORKS_CONV_2D

convolution_2d_transpose_bias

cos

densify

depth_to_space

DEPTH_TO_SPACE

ANEURALNETWORKS_DEPTH_TO_SPACE

ANEURALNETWORKS_DEPTH_TO_SPACE

depthwise_conv_2d

DEPTHWISE_CONV_2D

ANEURALNETWORKS_DEPTHWISE_CONV_2D

ANEURALNETWORKS_DEPTHWISE_CONV_2D

dequantize

DEQUANTIZE

ANEURALNETWORKS_DEQUANTIZE

div

DIV

ANEURALNETWORKS_DIV

ANEURALNETWORKS_DIV

elu

ELU

ANEURALNETWORKS_ELU

embedding_lookup

equal

EQUAL

ANEURALNETWORKS_EQUAL

exp

EXP

ANEURALNETWORKS_EXP

expand_dims

ANEURALNETWORKS_EXPAND_DIMS

external_const

fake_quant

fill

ANEURALNETWORKS_FILL

floor

FLOOR

ANEURALNETWORKS_FLOOR

floor_div

floor_mod

fully_connected

FULLY_CONNECTED

ANEURALNETWORKS_FULLY_CONNECTED

ANEURALNETWORKS_FULLY_CONNECTED

gather

GATHER

ANEURALNETWORKS_GATHER

gather_nd

greater

GREATER

ANEURALNETWORKS_GREATER

greater_equal

GREATER_OR_EQUAL

ANEURALNETWORKS_GREATER_EQUAL

hard_swish

HARD_SWISH

ANEURALNETWORKS_HARD_SWISH

l2_normalization

L2_NORMALIZATION

ANEURALNETWORKS_L2_NORMALIZATION

ANEURALNETWORKS_L2_NORMALIZATION

leaky_relu

less

LESS

ANEURALNETWORKS_LESS

less_equal

LESS_OR_EQUAL

ANEURALNETWORKS_LESS_EQUAL

local_response_normalization

LOCAL_RESPONSE_NORMALIZATION

ANEURALNETWORKS_LOCAL_RESPONSE_NORMALIZATION

log

ANEURALNETWORKS_LOG

log_softmax

LOG_SOFTMAX

ANEURALNETWORKS_LOG_SOFTMAX

ANEURALNETWORKS_LOG_SOFTMAX

logical_and

LOGICAL_AND

ANEURALNETWORKS_LOGICAL_AND

logical_not

LOGICAL_NOT

ANEURALNETWORKS_LOGICAL_NOT

logical_or

LOGICAL_OR

ANEURALNETWORKS_LOGICAL_OR

logistic

LOGISTIC

ANEURALNETWORKS_LOGISTIC

ANEURALNETWORKS_LOGISTIC

lstm

LSTM

ANEURALNETWORKS_LSTM

matrix_diag

matrix_set_diag

max_pool_2d

MAX_POOL_2D

ANEURALNETWORKS_MAX_POOL_2D

ANEURALNETWORKS_MAX_POOL_2D

max_pooling_with_argmax_2d

max_unpooling_2d

maximum

MAXIMUM

ANEURALNETWORKS_MAXIMUM

ANEURALNETWORKS_MAXIMUM

mean

MEAN

ANEURALNETWORKS_MEAN

minimum

MINIMUM

ANEURALNETWORKS_MINIMUM

ANEURALNETWORKS_MINIMUM

mirror_pad

mul

MUL

ANEURALNETWORKS_MUL

ANEURALNETWORKS_MUL

neg

NEG

ANEURALNETWORKS_NEG

non_max_suppression_v4

non_max_suppression_v5

not_equal

NOT_EQUAL

ANEURALNETWORKS_NOT_EQUAL

NumericVerify

one_hot

pack

pad

PAD

ANEURALNETWORKS_PAD

padv2

ANEURALNETWORKS_PAD_V2

pow

ANEURALNETWORKS_POW

prelu

PRELU

ANEURALNETWORKS_PRELU

ANEURALNETWORKS_PRELU

pseudo_const

pseudo_qconst

pseudo_sparse_const

pseudo_sparse_qconst

quantize

QUANTIZE

ANEURALNETWORKS_QUANTIZE

range

rank

RANK

ANEURALNETWORKS_RANK

reduce_any

ANEURALNETWORKS_REDUCE_ANY

reduce_max

REDUCE_MAX

ANEURALNETWORKS_REDUCE_MAX

reduce_min

REDUCE_MIN

ANEURALNETWORKS_REDUCE_MIN

reduce_prod

ANEURALNETWORKS_REDUCE_PROD

relu

RELU

ANEURALNETWORKS_RELU

ANEURALNETWORKS_RELU

relu6

RELU6

ANEURALNETWORKS_RELU6

ANEURALNETWORKS_RELU6

relu_n1_to_1

ANEURALNETWORKS_RELU1

ANEURALNETWORKS_RELU1

reshape

RESHAPE

ANEURALNETWORKS_RESHAPE

ANEURALNETWORKS_RESHAPE

resize_bilinear

RESIZE_BILINEAR

ANEURALNETWORKS_RESIZE_BILINEAR

resize_nearest_neighbor

RESIZE_NEAREST_NEIGHBOR

ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR

ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR

reverse_sequence

reverse_v2

round

rsqrt

RSQRT

ANEURALNETWORKS_RSQRT

ANEURALNETWORKS_RSQRT

segment_sum

select

ANEURALNETWORKS_SELECT

ANEURALNETWORKS_SELECT

select_v2

shape

sin

ANEURALNETWORKS_SIN

slice

ANEURALNETWORKS_SLICE

softmax

SOFTMAX

ANEURALNETWORKS_SOFTMAX

ANEURALNETWORKS_SOFTMAX

space_to_batch_nd

SPACE_TO_BATCH_ND

ANEURALNETWORKS_SPACE_TO_BATCH_ND

space_to_depth

SPACE_TO_DEPTH

ANEURALNETWORKS_SPACE_TO_DEPTH

ANEURALNETWORKS_SPACE_TO_DEPTH

sparse_to_dense

split

SPLIT

ANEURALNETWORKS_SPLIT

split_v

SPLIT_V

sqrt

SQRT

ANEURALNETWORKS_SQRT

ANEURALNETWORKS_SQRT

square

squared_difference

squeeze

ANEURALNETWORKS_SQUEEZE

strided_slice

STRIDED_SLICE

ANEURALNETWORKS_STRIDED_SLICE

sub

SUB

ANEURALNETWORKS_SUB

sum

SUM

svdf

ANEURALNETWORKS_SVDF

tanh

TANH

ANEURALNETWORKS_TANH

tile

ANEURALNETWORKS_TILE

topk_v2

ANEURALNETWORKS_TOPK_V2

ANEURALNETWORKS_TOPK_V2

transpose

TRANSPOSE

ANEURALNETWORKS_TRANSPOSE

ANEURALNETWORKS_TRANSPOSE

transpose_conv

TRANSPOSE_CONV

ANEURALNETWORKS_TRANSPOSE_CONV_2D

ANEURALNETWORKS_TRANSPOSE_CONV_2D

unidirectional_sequence_lstm

ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_LSTM

unidirectional_sequence_rnn

ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_RNN

unique

unpack

where

while

ANEURALNETWORKS_WHILE

yield

zeros_like

L2_POOL_2D

ANEURALNETWORKS_L2_POOL_2D

ANEURALNETWORKS_L2_POOL_2D

ANEURALNETWORKS_HASHTABLE_LOOKUP

ANEURALNETWORKS_HASHTABLE_LOOKUP

ANEURALNETWORKS_EMBEDDING_LOOKUP

ANEURALNETWORKS_EMBEDDING_LOOKUP

ANEURALNETWORKS_LSH_PROJECTION

ANEURALNETWORKS_RNN

ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM

ANEURALNETWORKS_AXIS_ALIGNED_B|BOX_TRANSFORM

ANEURALNETWORKS_BOX_WITH_NMS_LIMIT

ANEURALNETWORKS_BOX_WITH_NMS_LIMIT

ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_LSTM

ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_RNN

ANEURALNETWORKS_CHANNEL_SHUFFLE

ANEURALNETWORKS_CHANNEL_SHUFFLE

ANEURALNETWORKS_DETECTION_POSTPROCESSING

ANEURALNETWORKS_DETECTION_POSTPROCESSING

ANEURALNETWORKS_GENERATE_PROPOSALS

ANEURALNETWORKS_GENERATE_PROPOSALS

ANEURALNETWORKS_GROUPED_CONV_2D

ANEURALNETWORKS_GROUPED_CONV_2D

ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT

ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT

ANEURALNETWORKS_INSTANCE_NORMALIZATION

ANEURALNETWORKS_QUANTIZED_16BIT_LSTM

ANEURALNETWORKS_QUANTIZED_16BIT_LSTM

ANEURALNETWORKS_QUANTIZED_LSTM

ANEURALNETWORKS_QUANTIZED_LSTM

ANEURALNETWORKS_RANDOM_MULTINOMIAL

ANEURALNETWORKS_REDUCE_ALL

ANEURALNETWORKS_REDUCE_SUM

ANEURALNETWORKS_ROI_ALIGN

ANEURALNETWORKS_ROI_POOLING

ANEURALNETWORKS_IF

Demo

A python demo application for image recognition is built into the image that can be found in the /home/root/label_image directory.

cd /home/root/label_image
ls -l

-rw-r--r-- 1 59195 59195    61306 Mar  9  2018 grace_hopper.jpg                  # image for demo

-rw-r--r-- 1 59195 59195    25581 Mar  9  2018 labels.txt                        # lables for demo

-rw-r--r-- 1 59195 59195     3966 Mar  9  2018 label_image.py                    # python for demo

-rw-r--r-- 1 59195 59195 13978596 Mar  9  2018 mobilenet_v2_1.0_224.tflite       # float tflite model

-rw-r--r-- 1 59195 59195  3577760 Mar  9  2018 mobilenet_v2_1.0_224_quant.tflite # quantized tflite model
  • Execute on CPU

cd /home/root/label_image
python3 label_image.py --label_file labels.txt --image grace_hopper.jpg --model_file mobilenet_v2_1.0_224.tflite
  • Execute on GPU, with GPU delegate

cd /home/root/label_image
python3 label_image.py --label_file labels.txt --image grace_hopper.jpg --model_file mobilenet_v2_1.0_224.tflite --use_gpu
  • Execute on GPU, with Arm NN delegate

cd /home/root/label_image
python3 label_image.py --label_file labels.txt --image grace_hopper.jpg --model_file mobilenet_v2_1.0_224.tflite --use_armnn
  • Execute on VPU, with NNAPI delegate

cd /home/root/label_image
python3 label_image.py --label_file labels.txt --image grace_hopper.jpg --model_file mobilenet_v2_1.0_224_quant.tflite --use_nnapi

Benchmark Tool

Basic commands for running the benchmark tool with CPU and different delegates are as follows.

  • Execute on CPU (4 threads)

benchmark_model --graph=/home/root/label_image/mobilenet_v2_1.0_224.tflite --num_threads=4 --num_runs=10
  • Execute on GPU, with GPU delegate

benchmark_model --graph=/home/root/label_image/mobilenet_v2_1.0_224.tflite --use_gpu=1 --allow_fp16=0 --gpu_precision_loss_allowed=0 --num_runs=10
  • Execute on GPU, with Arm NN delegate

benchmark_model --graph=/home/root/label_image/mobilenet_v2_1.0_224.tflite --external_delegate_path=/usr/lib64/libarmnnDelegate.so.24 --external_delegate_options="backends:GpuAcc,CpuAcc;gpu-tuning-file:/usr/share/armnn/gpu-tuner-file.csv --num_runs=10
  • Execute on VPU, with NNAPI delegate

benchmark_model --graph=/home/root/label_image/mobilenet_v2_1.0_224_quant.tflite --use_nnapi=1 --disable_nnapi_cpu=1 --num_runs=10

Benchmark Result

The following table are the benchmark results under performance mode

Average inference time(ms)

Run model (.tflite) 10 times

CPU (Thread:4)

GPU

ARMNN(GpuAcc)

ARMNN(CpuAcc)

NNAPI: VPU

inception_v3

701.318

828.372

481.025

422.173

Not be executed by VPU

inception_v3_quant

346.663

831.208

281.988

278.647

99.082

mobilenet_v2_1.0.224

54.565

61.838

59.768

270.561

Not be executed by VPU

mobilenet_v2_1.0.224_quant

30.213

63.364

32.302

33.942

21.115

ResNet50V2_224_1.0

478.492

515.05

469.452

417.316

Not be executed by VPU

ResNet50V2_224_1.0_quant

259.21

531.242

220.072

210.188

166.74

ssd_mobilenet_v1_coco

151.304

222.407

208.136

112.915

Not be executed by VPU

ssd_mobilenet_v1_coco_quantized

74.014

227.678

81.075

72.463

31.387


Troubleshooting

Adjust Logging Severity Level for ARMNN delegate

You can set the logging severity level for ARMNN delegate via option key: logging-severity when delegate loads. The possible values of logging-severity are trace, debug, info, warning, error, and fatal.

Take the demo as an example, add the option logging-severity:debug to enable debug log.

cd /usr/share/label_image
python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/libarmnnDelegate.so.25 -o "backends:GpuAcc,CpuAcc;logging-severity:debug"

Adjust Logging Severity Level for NNAPI delegate

You can set the logging severity level for NNAPI delegate through the environment variable DEBUG_NN_VLOG. It must be set before NNAPI loads, as it is only read on startup. DEBUG_NN_VLOG is a list of tags, delimited by spaces, commas, or colons, indicating which logging is to be done. The tags are compilation, cpuexe, driver, execution, manager, and model.

Take the demo as an example: set the environment variable DEBUG_NN_VLOG=compilation to enable the compilation log.

export DEBUG_NN_VLOG=compilation
python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/nnapi_external_delegate.so

Determine What Operations are Executed by VP6

By enabling the compilation log, we can determine what operations are executed by VP6.

The default name of NNPAI HAL is cros-nnapi-default. If we find the log that is similar to ModelBuilder::findBestDeviceForEachOperation(CONV_2D)=0 (cros-nnapi-default), it means this operation(CONV_2D) works fine with NNPAI HAL, it can be executed by VP6; Otherwise, the operation is fallback to CPU execution.

Note

Set environment variable: DEBUG_NN_VLOG to compilation before running NN model.

  • OP is executed by VPU

    export DEBUG_NN_VLOG=compilation
    python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/nnapi_external_delegate.so
    ...
    ExecutionPlan.cpp:2057] ModelBuilder::findBestDeviceForEachOperation(CONV_2D) = 0 (cros-nnapi-default)
    ...
    
  • OP falls back to CPU execution

    export DEBUG_NN_VLOG=compilation
    python3 label_image.py --label_file labels_mobilenet_quant_v1_224.txt --image grace_hopper.jpg --model_file mobilenet_v1_1.0_224_quant.tflite -e /usr/lib64/nnapi_external_delegate.so
    ...
    ExecutionPlan.cpp:2037] Device cros-nnapi-default can not do operation CONV_2D
    ExecutionPlan.cpp:2057] ModelBuilder::findBestDeviceForEachOperation(CONV_2D) = 1 (nnapi-reference)
    ...
    

Is It Possible to Run Floating Point Model?

Yes. The float point model can run on CPU and GPU if all operations are supported.

Is FP16 (Half Precision Floating Point) Supported on APU?

The operations that APU supports are default QUANT8 implementation, some operations may support FP16 variant, such as

- ANEURALNETWORKS_FULLY_CONNECTED
- ANEURALNENTWORKS_CAST
- ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM
- ANEURALNETWORKS_DETECTION_POSTPROCESSING
- ANEURALNETWORKS_GENERATE_PROPOSALS
- ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT
- ANEURALNETWORKS_BOX_WITH_NMS_LIMIT
- ANEURALNETWORKS_LOG_SOFTMAX
- ANEURALNETWORKS_TRANSPOSE
- ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR
- ANEURALNETWORKS_DETECTION_POSTPROCESSING
- ANEURALNETWORKS_RSQRT
- ANEURALNETWORKS_SQRT
- ANEURALNETWORKS_DIV

Does AIoT Yocto Provide Opencv Support? If True, What Version of OpenCV is Provided?

AIoT provides OpenCV and OpenCV is provided as is because OpenCV Yocto integration is directly provided by openembedded, you can find recipe in src/meta-openembedded/meta-oe/recipes-support/opencv/opencv_${version}.bb. If necessary, you can integrate another version of OpenCV by yourself.

Regarding Benchmark Results, Why Are Some Models Inference by CPU Faster Than GPU?

Many factors could affect the efficiency of the GPU. GPU operations are asynchronous, the CPU might not be able to fill up the GPU cores in time. If some operations in the model are not supported by GPU, they will fall back to the CPU to execute. For this case, it might be more efficient to execute all operations by the CPU in multi-thread than to split the model into many subgraphs and execute them by different backends.

Do You Have Information About Accuracy with ARM NN?

ARM NN is provided as is, you can find recipe in src/meta-nn/recipes-armnn/armnn/armnn_${version}.bb. We did not evaluate the accuracy of ARM NN, but ARM NN provides a tool: ModelAccuracyTool-Armnn for measuring the Top 5 accuracy results of a model against an image dataset.

What TFlite quantization method Is Supported on APU?

  • About APU

    Post-training quantization, quantization-aware training and post-training dynamic range quantization: APU supports are default QUANT8 implementation, if all operations in the model are supported by APU, then the model can run on APU.

  • About CPU

    CPU is implemented by tflite, these quantization methods are supported by CPU.

  • About GPU

    Please refer to ARM NN documentation to check the restrictions of operations.

Is It Possible to Run Multiple Models Simultaneously on VP6

Currently VP6 can only process one operation at a time, can’t handle multiple operations at the same time. So you can’t run multiple models simultaneously on VP6.