VPU Guidelines

General Restrictions

Category	Restrictions
Per Channel Quantization	Supported operations with Symmetric signed 8-bit weights: Conv2D DepthwiseConv2D
Data Format	Only support NHWC format
I/O tensor	Dynamic shape is not supported Each dimension size should be in range [1, 65535] All input tensors as constant is not supported

Supported OPs Specification

The following list contains supported ANN operations (from 1.0 to 1.3) as well as their limitations in Neuron VPU backend.

OP Name	TFLite OP	NNAPI	Restrictions	Quantization data type	Floating data type
ArgMax ArgMin	ARG_MAX ARG_MIN	ARGMAX ARGMIN	Support at most 4D I/O Do not support batch axis	Input: Asym U8 / Asym I8 Output: Int32	Not support
AvgPooling	AVERAGE_POOL_2D	AVERAGE_POOL_2D	I/O must be 4D Weight W, H = [1:128] Stride W = H Stride W, H = [1:8] if it is NOT global pooling Support requantization Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
BboxTransform		AXIS_ALIGNED_BBOX_TRANSFORM	Supports NNAPI v1.2 behavior Number of Rois should be in [1:128] Number of classes should be in [1:100]	Input Roi: Asym U16 with scale = 0.125, zero point = 0 Input Bounding Box: Asym U8 / Asym I8 Input batch: Int32 Input image info: Asym U16 with scale = 0.125, zero point = 0 Output: Asym U16 with scale = 0.125, zero point = 0	Input Roi: FP16 Input Bounding Box: FP16 Input batch: Int32 Input image info: FP16 Output: FP16
BatchToSpace	BATCH_TO_SPACE_ND	BATCH_TO_SPACE_ND	Support at most 4D I/O Do not support crop I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
BoxNmsLimit		BOX_WITH_NMS_LIMIT	Supports NNAPI v1.2 behavior	Input Score: Asym U8 / Asym I8 Input Bounding Box: Asym U16 with scale = 0.125, zero point = 0 Input Batch: Int32 Output Score: Asym U8 / Asym I8 Output Bounding Box: Asym U16 with scale = 0.125, zero point = 0 Output Class ID: Int32 Output Batch Index: Int32	Input Score: FP16 Input Bounding Box: FP16 Input Batch: Int32 Output Score: FP16 Output Bounding Box: FP16 Output Class ID: Int32 Output Batch Index: Int32
Cast	CAST	CAST	Support at most 4D I/O Input should not be a constant	Input: Asym U8 / Asym I8 / Int32 Output: Asym U8 / Asym I8 / Int32	Not support
ChannelShuffle		CHANNEL_SHUFFLE	Support at most 4D I/O	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Concat	CONCATENATION	CONCATENATION	Support at most 4D I/O MAX input number is six Support inputs in different scale and zeropoints Do not support all inputs as constant	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Conv2D	CONV_2D	CONV_2D	I/O must be 4D Weight, bias must be constant Dilation W = H Dilation rate = [1:36] Weight H, W = [1:16] when dilation rate = 1 Weight H, W = [1:8] when dilation rate > 1 Stride W = H Stride W, H = 1, 2, 4 when dilation rate = 1 Stride W, H = [1:4] when dilation rate > 1 Support per-channel quantization Weight must be Sym I8 or Asym I8 with zero point = 0 Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
DepthwiseConv2D	DEPTHWISE_CONV_2D	DEPTHWISE_CONV_2D	I/O must be 4D Weight, bias must be constant Dilation W = H Dilation rate = [1:36] Weight H, W = [1:16] when dilation rate = 1 Weight H, W = [1:8] when dilation rate > 1 Stride W = H Stride W, H = 1, 2, 4 when dilation rate = 1 Stride W, H = [1:4] when dilation rate > 1 Support per-channel quantization Weight must be Sym I8 or Asym I8 with zero point = 0 Depth multiplier = 1 Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
DepthToSpace	DEPTH_TO_SPACE	DEPTH_TO_SPACE	I/O must be 4D Block size >= 1 I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Dequantize	DEQUANTIZE	DEQUANTIZE	Support at most 4D I/O Input and Output should have the same shape Input scale must be greater than 0 Per-channel quantization is not supported	Not support	Input: Asym U8 / Asym I8 Output: FP16
ElementWiseAdd	ADD	ADD	Support at most 4D I/O Support broadcast operation Support requantization One of input tensors can be constant Only support inputScale / (2 * maxInputScale) < 1	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Input: FP16 Output: FP16
ElementWiseDiv	DIV	DIV	Support at most 4D I/O Support broadcast operation	Not support	Input: FP16 Output: FP16
ElementWiseMul	MUL	MUL	Support at most 4D I/O Support broadcast operation Support requantization One of input tensors can be constant Only support inputProdScale / outputScale < 1	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Input: FP16 Output: FP16
ElementWiseSub	SUB	SUB	Support at most 4D I/O Support broadcast operation Support requantization One of input tensors can be constant Only support inputScale / (2 * maxInputScale) < 1	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Equal NotEqual Greater GreaterEqual Less LessEqual	EQUAL NOT_EQUAL GREATER GREATER_EQUAL LESS LESS_EQUAL	EQUAL NOT_EQUAL GREATER GREATER_EQUAL LESS LESS_EQUAL	Support at most 4D I/O Support broadcast operation	Input: Asym U8 / Asym I8 / Bool 8 / Int32 Output: Bool 8	Not support
Fill	FILL	FILL	Support larger than 4D I/O Output data type should be equal to value data type Input shape tensor should be a constant	Input Shape: Int32 Input Value: Asym U8 / Asym I8 / Asym U16 / Sym I8 / Sym I16 / Int32 Output: Asym U8 / Asym I8 / Asym U16 / Sym I8 / Sym I16 / Int32	Input Shape: Int32 Input Value: FP16 / FP32 Output: FP16 / FP32
FullyConnected	FULLY_CONNECTED	FULLY_CONNECTED	Support at most 4D Bias must be constant when weight is constant Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Gather	GATHER	GATHER	Support at most 4D I/O I/O must be in the same scale and zero point Only support single batch. Do not support gathering in batch axis Axis should be smaller than input rank Indices should be a constant	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
GroupConv2D	Composite pattern of CONV_2D	GROUPED_CONV_2D	I/O must be 4D Weight, bias must be constant Dilation rate = 1 Per-channel quantization is not supported Weight H, W = [1:16] Stride W = H Stride W, H = 1, 2, 4 Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
HardSwish	HARD_SWISH	HARD_SWISH	Support at most 4D I/O Input and Output should have the same dimension	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
HeatmapMaxKey		HEATMAP_MAX_KEYPOINT	Supports NNAPI v1.2 behavior (inputScale / ((1 << 20) * outputScale) need to be smaller than 1 Dynamic Heatmap sizes and dynamic number of keypoints are not supported	Input Heatmap: Asym U8 / Asym I8 Input Bounding Box: Asym U16 Output Score: Asym U8 / Asym I8 Output Keypoint Location: Asym U16	Input Heatmap: FP16 Input Bounding Box: FP16 Output Score: FP16 Output Keypoint Location: FP16
InstanceNorm		INSTANCE_NORMALIZATION	Support at most 4D I/O	Not support	Input: FP16 Input Gamma: FP16 Input Beta: FP16 Output: FP16
L2Norm	L2_NORMALIZATION	L2_NORMALIZATION	Support at most 4D I/O Axis must be constant Only support single axis, and axis can not be batch dimension	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
MaxPooling	MAX_POOL_2D	MAX_POOL_2D	I/O must be 4D Weight W, H = [1:16] Stride W = H Support requantization Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Maximum	MAXIMUM	MAXIMUM	Support at most 4D I/O Support broadcast operation Support requantization One of input tensors can be constant Only support inputScale / (2 * maxInputScale) < 1	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Mean	MEAN	MEAN	Support at most 4D I/O	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Minimum	MINIIMUM	MINIIMUM	Support at most 4D I/O Support broadcast operation Support requantization One of input tensors can be constant Only support inputScale / (2 * maxInputScale) < 1	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Pack	PACK		Reuse CONCATENATION Input rank must be smaller than 3 Output rank must be smaller than 4 The number of inputs should be in range [1:6] I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Pad	PAD PADV2	PAD PAD_V2	Support at most 4D I/O Pad value is by default zero in constant I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Pow	POW	POW	Support larger than 4D I/O Only support constant exponent Only support exponent size = 1, value = 0.5	Not support	Input: FP16 Input Exponent: FP16 Output: FP16
PRelu	PRELU	PRELU	Support at most 4D I/O Alpha should be a constant tensor The size of alpha should be equal to its channel size Support common slope or per-channel slope in depth InputScale * AlphaScale < OutputScale	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
QLSTM	LSTM	QUANTIZED_16BIT_LSTM	Supports NNAPI v1.2 behavior	Input: Asym U8 Output Cell State: Sym I16 Output Value: Asym U8	Not support
QLSTMV2	LSTM	QUANTIZED_LSTM	Supports NNAPI v1.3 behavior Optional tensor Input2InputWeight should not be a input tensor Optional tensor Recurrent2InputWeight should not be a input tensor Optional tensor InputGateBias should not be a input tensor Optional tensor ProjectionWeight should not be a input tensor Optional tensor ProjectionBias should not be a input tensor Optional tensor InNormWeight should not be a input tensor Optional tensor ForgetNormWeight should not be a input tensor Optional tensor CellNormWeight should not be a input tensor Optional tensor OutNormWeight should not be a input tensor	Data type follow NNAPI v1.3 spec, with allowing I/O as Asym U8 or Asym I8	Not support
Quantize	QUANTIZE	QUANTIZE	Support at most 4D I/O Input and Output should have the same shape Output scale must be greater than 0 Per-channel quantization is not supported	Not support	Input: FP16 Output: Asym U8 / Asym I8
ReduceAny	REDUCE_ANY	REDUCE_ANY	Support at most 4D I/O Do not support reduce in batch, width, and height at the same time	Input: Bool 8 Output: Bool 8	Not support
ReduceMax ReduceMin	REDUCE_MAX REDUCE_MIN	REDUCE_MAX REDUCE_MIN	Support at most 4D I/O Do not support reduce in batch, width, and height at the same time Input and output should have the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
ReLU ReLU1 ReLU6	RELU RELU_N1_TO_1 RELU6	RELU RELU1 RELU6	I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Reshape	RESHAPE	RESHAPE	Support at most 4D I/O	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Resize::BILINEAR	RESIZE_BILINEAR	RESIZE_BILINEAR	I/O must be 4D (HalfPixelCenters == true) is not support	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Resize::NEAREST	RESIZE_NEAREST_NEIGHBOR	RESIZE_NEAREST_NEIGHBOR	I/O must be 4D I/O must be in the same scale and zero point (HalfPixelCenters == true) is not support (AlignCorners == true) is not support	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
RoiAlign		ROI_ALIGN	Input must be 4D and non-constant Sampling W, H must be specified as [1:16]	Input: Asym U8 / Asym I8 Input Location: Asym U16 Input Batch index: Int32 Output: Asym U8 / Asym I8	Input: FP16 Input Location: FP16 Input Batch index: Int32 Output: FP16
RSqrt	RSQRT	RSQRT	Support larger than 4D I/O Input and Output should have the same dimension	Not support	Input: FP16 Output: FP16
Select	SELECT	SELECT	Support at most 4D I/O I/O must be the same shape One of input tensors can be constant	Input: Asym U8 / Asym I8 Input condition: Bool 8 Output: Asym U8 / Asym I8	Not support
Sigmoid	LOGISTIC	LOGISTIC	Support at most 4D I/O Output scale = 1/256, Output zeropoint = 0 for Asym U8 Output scale = 1/256, Output zeropoint = -128 for Asym I8	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Slice	SLICE	SLICE	Support at most 4D I/O I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
SoftMax	SOFTMAX	SOFTMAX	Support 2D/4D output Cannot support axis is batch Axis should be smaller than output rank Beta > 0 inputBetaMultiplier > 1 Support RESHAPE fusion	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Input: FP16 Output: FP16
SpaceToBatch	SPACE_TO_BATCH_ND	SPACE_TO_BATCH_ND	Support at most 4D I/O Do not support crop I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
SpaceToDepth	SPACE_TO_DEPTH	SPACE_TO_DEPTH	I/O must be 4D Block size >= 1 I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Split	SPLIT	SPLIT	Support at most 4D I/O MAX output number is six I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Sqrt	SQRT	SQRT	Support larger than 4D I/O Input and Output should have the same dimension	Not support	Input: FP16 Output: FP16
Square	SQUARE		Support larger than 4D I/O Input and Output should have the same dimension	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Input: FP16 Output: FP16
StridedSlice	STRIDED_SLICE	STRIDED_SLICE	Circular slice is not supported I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Tanh	TANH	TANH	Support at most 4D I/O Output scale = 1/128, Output zeropoint = 128 for Asym U8 Output scale = 1/128, Output zeropoint = 0 for Asym I8	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support
Tile	TILE	TILE	Support at most 4D I/O I/O must be in the same scale and zero point Input and output must have the same rank Invalid multiples: output dimensions should be divided by input dimensions	Input: Asym U8 / Asym I8 Input Multiples: Int32 Output: Asym U8 / Asym I8	Not support
TopK	TOPK_V2	TOPK_V2	Support at most 4D I/O Output values and indices should have the same dimensions Batch size should be same for both input and output K value should be in (0, the size of last input dimension]	Input: Asym U8 / Asym I8 Output value: Asym U8 / Asym I8 Output indices: Int32	Not support
Transpose	TRANSPOSE	TRANSPOSE	Support at most 4D I/O I/O must be in the same scale and zero point	Input: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Input: FP16 Output: FP16
TransposeConv2D	TRANSPOSE_CONV	TRANSPOSE_CONV_2D	I/O must be 4D Weight, bias must be constant Stride W, H > 1 Per-channel quantization is not supported Support PAD/RELU/RELU1/RELU6 fusion	Input: Asym U8 / Asym I8 Weight: Asym U8 / Asym I8 Output: Asym U8 / Asym I8	Not support