Abs 
ABS 
None 
AvgPooling 
AVERAGE_POOL_2D 
If filter shape is 1x1
There should be no padding.
Stride should not be 0.
If this is a global pooling
The filter_height x filter_width should be in range [1, 2^18].
Otherwise
The height and width of filter shape should be in range [1, 8].
The stride height should be in range [1, filter_height].
The stride width should be in range [1, filter_width].
The padding should be in range [0, 15].

BatchToSpace 
BATCH_TO_SPACE_ND 
None 
Concat 
CONCATENATION 
None 
Conv2D 
CONV_2D 
Input channel size
For 8bit data types, the input channel size should be in range [1, 8194].
For 16bit data types, the input channel size should be in range [1, 4095].
Input channel should be equal to filter channel.
Group conv2d is not supported (i.e., groups > 1)
Filter size
Filter height should be in range [1, 16].
Filter width should be in range [1, 16].
Stride
If the height and width of dilation rate is not equal to 1: stride height and width should be 1.
Otherwise: stride height and width should be in {1, 2, 3, 4, 8}.
Padding
For 1x1 filter, there should be no padding.
Otherwise, padding should be in range [0, 15].
Dilation rate
The height of dilation rate should be in {1, 2, 4, 8}.
The width of dilation rate should be in {1, 2, 4, 8}.
Dynamic Weight
The output channel of filer should be 16aligned.
The input channel of filer should be 32bytes aligned.

DepthwiseConv2D 
DEPTHWISE_CONV_2D 
Input channel size
For 8bit data types, the input channel size should be in range [1, 8194].
For 16bit data types, the input channel size should be in range [1, 4095].
Filter size
Filter height should be in range [1, 8].
Filter width should be in range [1, 8].
Stride
Stride height should be less than or equal to filter height and should be in {1, 2, 3, 4}.
Stride width should be less than or equal to filter width and should be in {1, 2, 3, 4}.
Padding should be in range [0, 15].
Dilation rate
The height of dilation rate should be in {1, 2, 4, 8}.
The width of dilation rate should be in {1, 2, 4, 8}.
Channel multiplier
Should be in range [1, 255]
If channel multiplier > 1
Channel multiplier should be 16aligned (i.e., 16, 32, 48, 64, …)
Dynamic weight
The channel of filter should be 32bytes aligned.
Cannot support nummdla=2 or more if enable dynamic weight. (bittrue issues).

DepthToSpace 
DEPTH_TO_SPACE 
Input and output batch must be 1. 
Dequantize 
DEQUANTIZE 
Input cannot be per channel quantization. 
ElementWiseAdd 
ADD 
Hardware doesn’t support broadcasting, except input1 or input2 is a 0D or 1D constant.
For other constant broadcasting cases, broadcasting is supported by software with compile time constant enlarge.
For other cases, broadcasting is supported by software using multiple concat operations.

ElementWiseDiv 
DIV 
Not yet support broadcasting. 
ElementWiseMul 
MUL 
Hardware doesn’t support broadcasting, except input1 or input2 is a 0D or 1D constant.
For other constant broadcasting cases, broadcasting is supported by software with compile time constant enlarge.
For other cases, broadcasting is supported by software using multiple concat operations.

ElementWiseSub 
SUB 
The scale of input1 (minuend) should be greater than or equal to the scale of input2 (subtrahend).
Broadcasting is supported by software using multiple concat operations.

Elu 
ELU 
None 
FullyConnected 
FULLY_CONNECTED 
Input channel (or the last dimension of input)
Should be 16aligned.
Or equal to the filter input channel.
Filter input channel (i.e., the 2nd dimension of filter)
Should be 16aligned or equal to the input channel size.
Should be in range [1, 1048576).
Dynamic Weight
The output channel of filer should be 16aligned.
The input channel of filer should be 32bytes aligned.

HardSwish 
HARD_SWISH 
For quantized model, it must concurrently meet conditions as following to keep precision.
TYPE is uint8, MIN(uint8)=0 MAX(uint8)=255
TYPE is int8, MIN(int8)=128 MAX(int8)=127
(input_offset  ROUND(3.0 / input_scale)) >= MIN(TYPE)
ABS(6.0  ROUND(6.0 / input_scale) * input_scale) <= 2 * (6.0 / (MAX(TYPE)  MIN(TYPE))

L2Pooling 
L2_POOL_2D 
Filter shape 1x1 is unsupported.
If this is a global pooling
The filter_height x filter_width should be in range [1, 2^10].
Otherwise
The height and width of filter shape should be in range [1, 8].
The stride height should be in range [1, filter_height].
The stride width should be in range [1, filter_width].
The padding should be in range [0, 15].
Data type
Floating point is unsupported.

MaxPooling 
MAX_POOL_2D 
If filter shape is 1x1
There should be no padding.
Stride should not be 0.
If this is a global pooling
The filter_height x filter_width should be in range [1, 2^18].
Otherwise
The height and width of filter shape should be in range [1, 8].
The stride height should be in range [1, filter_height].
The stride width should be in range [1, filter_width].
The padding should be in range [0, 15].

Maximum 
MAXIMUM 
Broadcasting is supported by software using multiple concat operations. 
Mean 
MEAN 
Axis should be height (H) and width (W) dimensions.
The height and width of output shape should be 1.
The input_height x input_width should be in range [1, 2^18].
For floating point types, the input_height and input_width must satisfy one of the following constraints to avoid accuracy issue:
input_height(input_width) must be less than or equal to S, where S = 64,
input_height(input_width) must be factorable in the form of “2^a * 3^b * 5^c * 7^d * N”, where N is 1 or a prime number less than or equal to S.

Minimum 
MINIIMUM 
Broadcasting is supported by software using multiple concat operations. 
Neg 
NEG 
None 
Pack 
PACK 
Can not pack at last dimension. 
Pad 
PAD PADV2 
For quantized types, input and output activations should have the same zeropoint and scale.

Pow 
POW 
Exponent should be constant
Exponent should be equal to 2.f.

PRelu 
PRELU 
Alpha should be a scalar or 1D constant tensor.
The data types of input and output should be the same.
LeakyRelu case is included.

QLSTM (5 inputs) 
LSTM 
Bias scale should be smaller than 2^10.
The last dimension of input + the last dimension of output scratch should be
16aligned
in range [1, 1048576)

Quantize 
QUANTIZE 
None 
ReLU ReLU1 ReLU6 
RELU RELU_N1_TO_1 RELU6 
None 
Reshape 
RESHAPE 
None 
Resize::BILINEAR 
RESIZE_BILINEAR 
Input Height should be less than or equal to 8192.
Input Width should be less than or equal to 8192.
half_pixel_centers must be false.

Resize::NEAREST 
RESIZE_NEAREST_NEIGHBOR 
Input Height should be less than or equal to 8192.
Input Width should be less than or equal to 8192.
half_pixel_centers must be false.

RSqrt 
RSQRT 
NONE 
Sigmoid 
LOGISTIC 
None 
Slice 
SLICE 
None 
SoftMax 
SOFTMAX 
Axis should be 1. 
SpaceToBatch 
SPACE_TO_BATCH_ND 
Input batch must be 1. 
SpaceToDepth 
SPACE_TO_DEPTH 
Input batch must be 1. 
Split 
SPLIT 
None 
Sqrt 
SQRT 
NONE 
Square 
SQUARE 
NONE 
SquaredDifference 
SQUARED_DIFFERENCE 
None 
StridedSlice 
STRIDED_SLICE 
Stride should be greater than or equal to 1.
Stride on the last dimension is unsupported.

Tanh 
TANH 
For quantized types, InputScale/OutputScale should < 842. 
Transpose 
TRANSPOSE 
Supports transpose among H, W, C dimensions. 
TransposeConv2D 
TRANSPOSE_CONV 
Input channel size
For 8bit data types, the input channel size should be in range [1, 8194].
For 16bit data types, the input channel size should be in range [1, 4095].
Filter size
Filter height should be in range [1, 16].
Filter width should be in range [1, 16].
Stride
Stride height should be less than or equal to filter height and should be in {1, 2, 3, 4, 8}.
Stride width should be less than or equal to filter width and should be in {1, 2, 3, 4, 8}.
Padding should be in range [0, 15].

Unpack 
UNPACK 
Can not unpack at last dimension. 