MDLA 2.0 Guidelines

Note

The following limitations may not be equal to MDLA hardware constraints. This is because Neuron might have software workarounds for MDLA hardware, or limitations due to the current software implementation.

General Restrictions

Category	Restrictions
Tensor	Rank Only support 0-D, 1-D, 2-D, 3-D, 4-D tensors.
Batch Size (N)	Batch size should be in range [1, 255], except Conv2D, DepthwiseConv2D and FullyConnected.
Height Size (H)	Should be in range [1, 65536) for both input and output activations.
Width Size (W)	Should be in range [1, 65536) for both input and output activations.
Channel Size (C)	Should be in range [1, 65536) for both input and output activations.
Data Type	Only the following data types are supported: Asymmetric unsigned 8-bit. Asymmetric signed 8-bit. Symmetric signed 8-bit. Symmetric signed 16-bit. 16-bit floating point (FP16). 32-bit floating point (FP32). Converted to FP16 if `relax-FP32` is enabled.
Per Channel Quantization	The following operations support per channel quantization: Conv2D DepthwiseConv2D TransposeConv2D FullyConnected PRelu
Data Format	Only NHWC format is supported.
MDLA Hardware Buffer	MDLA has different internal buffers for different usages. If there is no sufficient buffer for the given operation, this operation cannot be supported by MDLA. Here is a guideline to avoid the internal buffer constraint for MDLA: Keep input channel size small. Keep stride values (in both width and height) small for operations that have stride (e.g., convolution and pooling). Keep filter size (in both width and height) small, especially for the convolution-like operations.

Supported OPs Specification

OP Name	TFLite OP	Restrictions
Abs	ABS	None
AvgPooling	AVERAGE_POOL_2D	If filter shape is 1x1 There should be no padding. Stride should not be 0. If this is a global pooling The filter_height x filter_width should be in range [1, 2^18]. Otherwise The height and width of filter shape should be in range [1, 8]. The stride height should be in range [1, filter_height]. The stride width should be in range [1, filter_width]. The padding should be in range [0, 15].
BatchToSpace	BATCH_TO_SPACE_ND	None
Concat	CONCATENATION	None
Conv2D	CONV_2D	Input channel size For 8-bit data types, the input channel size should be in range [1, 8194]. For 16-bit data types, the input channel size should be in range [1, 4095]. Input channel should be equal to filter channel. Group Conv2d is not supported (i.e., groups > 1) Filter size Filter height should be in range [1, 16]. Filter width should be in range [1, 16]. Stride If the height and width of dilation rate is not equal to 1: stride height and width should be 1. Otherwise: stride height and width should be in {1, 2, 3, 4, 8}. Padding For 1x1 filter, there should be no padding. Otherwise, padding should be in range [0, 15]. Dilation rate The height of dilation rate should be in {1, 2, 4, 8}. The width of dilation rate should be in {1, 2, 4, 8}. Dynamic Weight The output channel of filer should be 16-aligned. The input channel of filer should be 32-bytes aligned.
DepthwiseConv2D	DEPTHWISE_CONV_2D	Input channel size For 8-bit data types, the input channel size should be in range [1, 8194]. For 16-bit data types, the input channel size should be in range [1, 4095]. Filter size Filter height should be in range [1, 8]. Filter width should be in range [1, 8]. Stride Stride height should be less than or equal to filter height and should be in {1, 2, 3, 4}. Stride width should be less than or equal to filter width and should be in {1, 2, 3, 4}. Padding should be in range [0, 15]. Dilation rate The height of dilation rate should be in {1, 2, 4, 8}. The width of dilation rate should be in {1, 2, 4, 8}. Channel multiplier Should be in range [1, 255] If channel multiplier > 1 Channel multiplier should be 16-aligned (i.e., 16, 32, 48, 64, …) Dynamic weight The channel of filter should be 32-bytes aligned. Cannot support `-num-mdla=2` or more if enable dynamic weight. (bit-true issues).
DepthToSpace	DEPTH_TO_SPACE	Input and output batch must be 1.
Dequantize	DEQUANTIZE	Input cannot be per channel quantization.
ElementWiseAdd	ADD	Hardware doesn’t support broadcasting, except input-1 or input-2 is a 0-D or 1-D constant. For other constant broadcasting cases, broadcasting is supported by software with compile time constant enlarge. For other cases, broadcasting is supported by software using multiple concat operations.
ElementWiseDiv	DIV	Not yet support broadcasting.
ElementWiseMul	MUL	Hardware doesn’t support broadcasting, except input-1 or input-2 is a 0-D or 1-D constant. For other constant broadcasting cases, broadcasting is supported by software with compile time constant enlarge. For other cases, broadcasting is supported by software using multiple concat operations.
ElementWiseSub	SUB	The scale of input1 (minuend) should be greater than or equal to the scale of input2 (subtrahend). Broadcasting is supported by software using multiple concat operations.
Elu	ELU	None
FullyConnected	FULLY_CONNECTED	Input channel (or the last dimension of input) Should be 16-aligned. Or equal to the filter input channel. Filter input channel (i.e., the second dimension of filter) Should be 16-aligned or equal to the input channel size. Should be in range [1, 1048576). Dynamic Weight The output channel of filer should be 16-aligned. The input channel of filer should be 32-bytes aligned.
HardSwish	HARD_SWISH	For quantized model, it must concurrently meet conditions as following to keep precision. TYPE is uint8, MIN(uint8)=0 MAX(uint8)=255 TYPE is int8, MIN(int8)=-128 MAX(int8)=127 (input_offset - ROUND(3.0 / input_scale)) >= MIN(TYPE) ABS(6.0 - ROUND(6.0 / input_scale) * input_scale) <= 2 * (6.0 / (MAX(TYPE) - MIN(TYPE))
L2Pooling	L2_POOL_2D	Filter shape 1x1 is unsupported. If this is a global pooling The filter_height x filter_width should be in range [1, 2^10]. Otherwise The height and width of filter shape should be in range [1, 8]. The stride height should be in range [1, filter_height]. The stride width should be in range [1, filter_width]. The padding should be in range [0, 15]. Data type Floating point is unsupported.
MaxPooling	MAX_POOL_2D	If filter shape is 1x1 There should be no padding. Stride should not be 0. If this is a global pooling The filter_height x filter_width should be in range [1, 2^18]. Otherwise The height and width of filter shape should be in range [1, 8]. The stride height should be in range [1, filter_height]. The stride width should be in range [1, filter_width]. The padding should be in range [0, 15].
Maximum	MAXIMUM	Broadcasting is supported by software using multiple concat operations.
Mean	MEAN	Axis should be height (H) and width (W) dimensions. The height and width of output shape should be 1. The input_height x input_width should be in range [1, 2^18]. For floating point types, the input_height and input_width must satisfy one of the following constraints to avoid accuracy issue: input_height(input_width) must be less than or equal to S, where S = 64, input_height(input_width) must be factorable in the form of “2^a * 3^b * 5^c * 7^d * N”, where N is 1 or a prime number less than or equal to S.
Minimum	MINIIMUM	Broadcasting is supported by software using multiple concat operations.
Neg	NEG	None
Pack	PACK	Can not pack at last dimension.
Pad	PAD PADV2	For quantized types, input and output activations should have the same zero-point and scale.
Pow	POW	Exponent should be constant Exponent should be equal to 2.f.
PRelu	PRELU	Alpha should be a scalar or 1-D constant tensor. The data types of input and output should be the same. LeakyRelu case is included.
QLSTM (5 inputs)	LSTM	Bias scale should be smaller than 2^-10. The last dimension of input + the last dimension of output scratch should be 16-aligned in range [1, 1048576)
Quantize	QUANTIZE	None
ReLU ReLU1 ReLU6	RELU RELU_N1_TO_1 RELU6	None
Reshape	RESHAPE	None
Resize::BILINEAR	RESIZE_BILINEAR	Input Height should be less than or equal to 8192. Input Width should be less than or equal to 8192. half_pixel_centers must be false.
Resize::NEAREST	RESIZE_NEAREST_NEIGHBOR	Input Height should be less than or equal to 8192. Input Width should be less than or equal to 8192. half_pixel_centers must be false.
RSqrt	RSQRT	NONE
Sigmoid	LOGISTIC	None
Slice	SLICE	None
SoftMax	SOFTMAX	Axis should be -1.
SpaceToBatch	SPACE_TO_BATCH_ND	Input batch must be 1.
SpaceToDepth	SPACE_TO_DEPTH	Input batch must be 1.
Split	SPLIT	None
Sqrt	SQRT	NONE
Square	SQUARE	NONE
SquaredDifference	SQUARED_DIFFERENCE	None
StridedSlice	STRIDED_SLICE	Stride should be greater than or equal to 1. Stride on the last dimension is unsupported.
Tanh	TANH	For quantized types, InputScale/OutputScale should < 842.
Transpose	TRANSPOSE	Supports transpose among H, W, C dimensions.
TransposeConv2D	TRANSPOSE_CONV	Input channel size For 8-bit data types, the input channel size should be in range [1, 8194]. For 16-bit data types, the input channel size should be in range [1, 4095]. Filter size Filter height should be in range [1, 16]. Filter width should be in range [1, 16]. Stride Stride height should be less than or equal to filter height and should be in {1, 2, 3, 4, 8}. Stride width should be less than or equal to filter width and should be in {1, 2, 3, 4, 8}. Padding should be in range [0, 15].
Unpack	UNPACK	Can not unpack at last dimension.