VPU Guidelines

General Restrictions

Category

Restrictions

Per Channel Quantization

Supported operations with Symmetric signed 8-bit weights:
  1. Conv2D

  2. DepthwiseConv2D

Data Format

Only support NHWC format

I/O tensor

  1. Dynamic shape is not supported

  2. Each dimension size should be in range [1, 65535]

  3. All input tensors as constant is not supported

Supported OPs Specification

The following list contains supported ANN operations (from 1.0 to 1.3) as well as their limitations in Neuron VPU backend.

OP Name

TFLite OP

NNAPI

Restrictions

Quantization data type

Floating data type

ArgMax
ArgMin

ARG_MAX
ARG_MIN

ARGMAX
ARGMIN

  1. Support at most 4D I/O

  2. Do not support batch axis

  1. Input: Asym U8 / Asym I8

  2. Output: Int32

Not support

AvgPooling

AVERAGE_POOL_2D

AVERAGE_POOL_2D

  1. I/O must be 4D

  2. Weight W, H = [1:128]

  3. Stride W = H

  4. Stride W, H = [1:8] if it is NOT global pooling

  5. Support requantization

  6. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support

BboxTransform

AXIS_ALIGNED_BBOX_TRANSFORM

  1. Supports NNAPI v1.2 behavior

  2. Number of rois should be in [1:128]

  3. Number of classes should be in [1:100]

  1. Input Roi: Asym U16 with scale = 0.125, zero point = 0

  2. Input Bounding Box: Asym U8 / Asym I8

  3. Input batch: Int32

  4. Input image info: Asym U16 with scale = 0.125, zero point = 0

  5. Output: Asym U16 with scale = 0.125, zero point = 0

  1. Input Roi: FP16

  2. Input Bounding Box: FP16

  3. Input batch: Int32

  4. Input image info: FP16

  5. Output: FP16

BatchToSpace

BATCH_TO_SPACE_ND

BATCH_TO_SPACE_ND

  1. Support at most 4D I/O

  2. Do not support crop

  3. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

BoxNmsLimit

BOX_WITH_NMS_LIMIT

Supports NNAPI v1.2 behavior

  1. Input Score: Asym U8 / Asym I8

  2. Input Bounding Box: Asym U16 with scale = 0.125, zero point = 0

  3. Input Batch: Int32

  4. Output Score: Asym U8 / Asym I8

  5. Output Bounding Box: Asym U16 with scale = 0.125, zero point = 0

  6. Output Class ID: Int32

  7. Output Batch Index: Int32

  1. Input Score: FP16

  2. Input Bounding Box: FP16

  3. Input Batch: Int32

  4. Output Score: FP16

  5. Output Bounding Box: FP16

  6. Output Class ID: Int32

  7. Output Batch Index: Int32

Cast

CAST

CAST

  1. Support at most 4D I/O

  2. Input should not be a constant

  1. Input: Asym U8 / Asym I8 / Int32

  2. Output: Asym U8 / Asym I8 / Int32

Not support

ChannelShuffle

CHANNEL_SHUFFLE

  1. Support at most 4D I/O

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Concat

CONCATENATION

CONCATENATION

  1. Support at most 4D I/O

  2. MAX input number is six

  3. Support inputs in different scale and zeropoints

  4. Do not support all inputs as constant

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Conv2D

CONV_2D

CONV_2D

  1. I/O must be 4D

  2. Weight, bias must be constant

  3. Dilation W = H

  4. Dilation rate = [1:36]
    1. Weight H, W = [1:16] when dilation rate = 1

    2. Weight H, W = [1:8] when dilation rate > 1

  5. Stride W = H
    1. Stride W, H = 1, 2, 4 when dilation rate = 1

    2. Stride W, H = [1:4] when dilation rate > 1

  6. Support per-channel quantization
    1. Weight must be Sym I8 or Asym I8 with zero point = 0

  7. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support

DepthwiseConv2D

DEPTHWISE_CONV_2D

DEPTHWISE_CONV_2D

  1. I/O must be 4D

  2. Weight, bias must be constant

  3. Dilation W = H

  4. Dilation rate = [1:36]
    1. Weight H, W = [1:16] when dilation rate = 1

    2. Weight H, W = [1:8] when dilation rate > 1

  5. Stride W = H
    1. Stride W, H = 1, 2, 4 when dilation rate = 1

    2. Stride W, H = [1:4] when dilation rate > 1

  6. Support per-channel quantization
    1. Weight must be Sym I8 or Asym I8 with zero point = 0

  7. Depth multiplier = 1

  8. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support

DepthToSpace

DEPTH_TO_SPACE

DEPTH_TO_SPACE

  1. I/O must be 4D

  2. Block size >= 1

  3. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Dequantize

DEQUANTIZE

DEQUANTIZE

  1. Support at most 4D I/O

  2. Input and Output should have the same shape

  3. Input scale must be greater than 0

  4. Per-channel quantization is not supported

Not support

  1. Input: Asym U8 / Asym I8

  2. Output: FP16

ElementWiseAdd

ADD

ADD

  1. Support at most 4D I/O

  2. Support broadcast operation

  3. Support requantization

  4. One of input tensors can be constant

  5. Only support inputScale / (2 * maxInputScale) < 1

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

  1. Input: FP16

  2. Output: FP16

ElementWiseDiv

DIV

DIV

  1. Support at most 4D I/O

  2. Support broadcast operation

Not support

  1. Input: FP16

  2. Output: FP16

ElementWiseMul

MUL

MUL

  1. Support at most 4D I/O

  2. Support broadcast operation

  3. Support requantization

  4. One of input tensors can be constant

  5. Only support inputProdScale / outputScale < 1

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

  1. Input: FP16

  2. Output: FP16

ElementWiseSub

SUB

SUB

  1. Support at most 4D I/O

  2. Support broadcast operation

  3. Support requantization

  4. One of input tensors can be constant

  5. Only support inputScale / (2 * maxInputScale) < 1

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Equal
NotEqual
Greater
GreaterEqual
Less
LessEqual

EQUAL
NOT_EQUAL
GREATER
GREATER_EQUAL
LESS
LESS_EQUAL

EQUAL
NOT_EQUAL
GREATER
GREATER_EQUAL
LESS
LESS_EQUAL

  1. Support at most 4D I/O

  2. Support broadcast operation

  1. Input: Asym U8 / Asym I8 / Bool 8 / Int32

  2. Output: Bool 8

Not support

Fill

FILL

FILL

  1. Support larger than 4D I/O

  2. Output data type should be equal to value data type

  3. Input shape tensor should be a constant

  1. Input Shape: Int32

  2. Input Value: Asym U8 / Aysm I8 / Asym U16 / Sym I8 / Sym I16 / Int32

  3. Output: Asym U8 / Aysm I8 / Asym U16 / Sym I8 / Sym I16 / Int32

  1. Input Shape: Int32

  2. Input Value: FP16 / FP32

  3. Output: FP16 / FP32

FullyConnected

FULLY_CONNECTED

FULLY_CONNECTED

  1. Support at most 4D

  2. Bias must be constant when weight is constant

  3. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support

Gather

GATHER

GATHER

  1. Support at most 4D I/O

  2. I/O must be in the same scale and zero point

  3. Only support single batch.

  4. Do not support gathering in batch axis

  5. Axis should be smaller than input rank

  6. Indices should be a constant

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

GroupConv2D

Composite pattern of CONV_2D

GROUPED_CONV_2D

  1. I/O must be 4D

  2. Weight, bias must be constant

  3. Dilation rate = 1

  4. Per-channel quantization is not supported

  5. Weight H, W = [1:16]

  6. Stride W = H

  7. Stride W, H = 1, 2, 4

  8. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support

HardSwish

HARD_SWISH

HARD_SWISH

  1. Support at most 4D I/O

  2. Input and Output should have the same dimension

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

HeatmapMaxKey

HEATMAP_MAX_KEYPOINT

  1. Supports NNAPI v1.2 behavior

  2. (inputScale / ((1 << 20) * outputScale) need to be smaller than 1

  3. Dynamic heatmap sizes and dynamic number of keypoints are not supported

  1. Input Heatmap: Asym U8 / Asym I8

  2. Input Bounding Box: Asym U16

  3. Output Score: Asym U8 / Asym I8

  4. Output Keypoint Location: Asym U16

  1. Input Heatmap: FP16

  2. Input Bounding Box: FP16

  3. Output Score: FP16

  4. Output Keypoint Location: FP16

InstanceNorm

INSTANCE_NORMALIZATION

  1. Support at most 4D I/O

Not support

  1. Input: FP16

  2. Input Gamma: FP16

  3. Input Beta: FP16

  4. Output: FP16

L2Norm

L2_NORMALIZATION

L2_NORMALIZATION

  1. Support at most 4D I/O

  2. Axis must be constant

  3. Only support single axis, and axis can not be batch dimension

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

MaxPooling

MAX_POOL_2D

MAX_POOL_2D

  1. I/O must be 4D

  2. Weight W, H = [1:16]

  3. Stride W = H

  4. Support requantization

  5. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support

Maximum

MAXIMUM

MAXIMUM

  1. Support at most 4D I/O

  2. Support broadcast operation

  3. Support requantization

  4. One of input tensors can be constant

  5. Only support inputScale / (2 * maxInputScale) < 1

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Mean

MEAN

MEAN

  1. Support at most 4D I/O

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Minimum

MINIIMUM

MINIIMUM

  1. Support at most 4D I/O

  2. Support broadcast operation

  3. Support requantization

  4. One of input tensors can be constant

  5. Only support inputScale / (2 * maxInputScale) < 1

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Pack

PACK

  1. Reuse CONCATENATION

  2. Input rank must be smaller than 3

  3. Output rank must be smaller than 4

  4. The number of inputs should be in range [1:6]

  5. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Pad

PAD
PADV2

PAD
PAD_V2

  1. Support at most 4D I/O

  2. Pad value is by default zero in constant

  3. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Pow

POW

POW

  1. Support larger than 4D I/O

  2. Only support constant exponent

  3. Only support exponent size = 1, value = 0.5

Not support

  1. Input: FP16

  2. Input Exponent: FP16

  3. Output: FP16

PRelu

PRELU

PRELU

  1. Support at most 4D I/O

  2. Alpha should be a constant tensor

  3. The size of alpha should be equal to its channel size

  4. Support common slope or per-channel slope in depth

  5. InputScale * AlphaScale < OutputScale

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

QLSTM

LSTM

QUANTIZED_16BIT_LSTM

Supports NNAPI v1.2 behavior

  1. Input: Asym U8

  2. Output Cell State: Sym I16

  3. Output Value: Asym U8

Not support

QLSTMV2

LSTM

QUANTIZED_LSTM

  1. Supports NNAPI v1.3 behavior

  2. Optional tensor Input2InputWeight should not be a input tensor

  3. Optional tensor Recurrent2InputWeight should not be a input tensor

  4. Optional tensor InputGateBias should not be a input tensor

  5. Optional tensor ProjectionWeight should not be a input tensor

  6. Optional tensor ProjectionBias should not be a input tensor

  7. Optional tensor InNormWeight should not be a input tensor

  8. Optional tensor ForgetNormWeight should not be a input tensor

  9. Optional tensor CellNormWeight should not be a input tensor

  10. Optional tensor OutNormWeight should not be a input tensor

Data type follow NNAPI v1.3 spec, with allowing I/O as Asym U8 or Asym I8

Not support

Quantize

QUANTIZE

QUANTIZE

  1. Support at most 4D I/O

  2. Input and Output should have the same shape

  3. Output scale must be greater than 0

  4. Per-channel quantization is not supported

Not support

  1. Input: FP16

  2. Output: Asym U8 / Asym I8

ReduceAny

REDUCE_ANY

REDUCE_ANY

  1. Support at most 4D I/O

  2. Do not support reduce in batch, width, and height at the same time

  1. Input: Bool 8

  2. Output: Bool 8

Not support

ReduceMax
ReduceMin

REDUCE_MAX
REDUCE_MIN

REDUCE_MAX
REDUCE_MIN

  1. Support at most 4D I/O

  2. Do not support reduce in batch, width, and height at the same time

  3. Input and output should have the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

ReLU
ReLU1
ReLU6

RELU
RELU_N1_TO_1
RELU6

RELU
RELU1
RELU6

  1. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Reshape

RESHAPE

RESHAPE

  1. Support at most 4D I/O

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Resize::BILINEAR

RESIZE_BILINEAR

RESIZE_BILINEAR

  1. I/O must be 4D

  2. (HalfPixelCenters == true) is not support

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Resize::NEAREST

RESIZE_NEAREST_NEIGHBOR

RESIZE_NEAREST_NEIGHBOR

  1. I/O must be 4D

  2. I/O must be in the same scale and zero point

  3. (HalfPixelCenters == true) is not support

  4. (AlignCorners == true) is not support

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

RoiAlign

ROI_ALIGN

  1. Input must be 4D and non-constant

  2. Sampling W, H must be specified as [1:16]

  1. Input: Asym U8 / Asym I8

  2. Input Location: Asym U16

  3. Input Batch index: Int32

  4. Output: Asym U8 / Asym I8

  1. Input: FP16

  2. Input Location: FP16

  3. Input Batch index: Int32

  4. Output: FP16

RSqrt

RSQRT

RSQRT

  1. Support larger than 4D I/O

  2. Input and Output should have the same dimension

Not support

  1. Input: FP16

  2. Output: FP16

Select

SELECT

SELECT

  1. Support at most 4D I/O

  2. I/O must be the same shape

  3. One of input tensors can be constant

  1. Input: Asym U8 / Asym I8

  2. Input condition: Bool 8

  3. Output: Asym U8 / Asym I8

Not support

Sigmoid

LOGISTIC

LOGISTIC

  1. Support at most 4D I/O

  2. Output scale = 1/256, Output zeropoint = 0 for Asym U8

  3. Output scale = 1/256, Output zeropoint = -128 for Asym I8

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Slice

SLICE

SLICE

  1. Support at most 4D I/O

  2. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

SoftMax

SOFTMAX

SOFTMAX

  1. Support 2D/4D output

  2. Cannot support axis is batch

  3. Axis should be smaller than output rank

  4. Beta > 0

  5. inputBetaMultiplier > 1

  6. Support RESHAPE fusion

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

  1. Input: FP16

  2. Output: FP16

SpaceToBatch

SPACE_TO_BATCH_ND

SPACE_TO_BATCH_ND

  1. Support at most 4D I/O

  2. Do not support crop

  3. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

SpaceToDepth

SPACE_TO_DEPTH

SPACE_TO_DEPTH

  1. I/O must be 4D

  2. Block size >= 1

  3. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Split

SPLIT

SPLIT

  1. Support at most 4D I/O

  2. MAX output number is six

  3. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Sqrt

SQRT

SQRT

  1. Support larger than 4D I/O

  2. Input and Output should have the same dimension

Not support

  1. Input: FP16

  2. Output: FP16

Square

SQUARE

  1. Support larger than 4D I/O

  2. Input and Output should have the same dimension

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

  1. Input: FP16

  2. Output: FP16

StridedSlice

STRIDED_SLICE

STRIDED_SLICE

  1. Circular slice is not supported

  2. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Tanh

TANH

TANH

  1. Support at most 4D I/O

  2. Output scale = 1/128, Output zeropoint = 128 for Asym U8

  3. Output scale = 1/128, Output zeropoint = 0 for Asym I8

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

Not support

Tile

TILE

TILE

  1. Support at most 4D I/O

  2. I/O must be in the same scale and zero point

  3. Input and output must have the same rank

  4. Invalid multiples: output dimensions should be divided by input dimensions

  1. Input: Asym U8 / Asym I8

  2. Input Multiples: Int32

  3. Output: Asym U8 / Asym I8

Not support

TopK

TOPK_V2

TOPK_V2

  1. Support at most 4D I/O

  2. Output values and indices should have the same dimensions

  3. Batch size should be same for both input and output

  4. K value should be in (0, the size of last input dimension]

  1. Input: Asym U8 / Asym I8

  2. Output value: Asym U8 / Asym I8

  3. Output indices: Int32

Not support

Transpose

TRANSPOSE

TRANSPOSE

  1. Support at most 4D I/O

  2. I/O must be in the same scale and zero point

  1. Input: Asym U8 / Asym I8

  2. Output: Asym U8 / Asym I8

  1. Input: FP16

  2. Output: FP16

TransposeConv2D

TRANSPOSE_CONV

TRANSPOSE_CONV_2D

  1. I/O must be 4D

  2. Weight, bias must be constant

  3. Stride W, H > 1

  4. Per-channel quantization is not supported

  5. Support PAD/RELU/RELU1/RELU6 fusion

  1. Input: Asym U8 / Asym I8

  2. Weight: Asym U8 / Asym I8

  3. Output: Asym U8 / Asym I8

Not support