MDLA 3.0 Guidelines

Note

The following limitations may not be equal to MDLA hardware constraints. This is because Neuron might have software workarounds for MDLA hardware, or limitations due to the current software implementation.

General Restrictions

Category	Limitations
Tensor Rank	Supported tensor ranks: For operation (OP) type Conv3D, AvgPool3D, L2Pool3D, MinPool3D, and MaxPool3D: 5-D For all other OP types: 0-D, 1-D, 2-D, 3-D, 4-D
Batch Size (N)	Valid batch sizes: FULLY_CONNECTED: {1, 2, 4, 8}. FULLY_CONNECTED with any other batch size is converted to OP CONV_2D. CONV_2D, DEPTHWISE_CONV_2D, TRANSPOSE_CONV: No batch size limit. If batch size is 65535 or less, the OP is split into multiple OPs. All other OPs: [1, 65535]
Height Size (H)	Valid range for input and output activations: [1, 65535]
Width Size (W)	Valid range for input and output activations: [1, 65535]
Channel Size (C)	Valid range for input and output activations: [1, 65535]
Data Type	Supported data types: Asymmetric unsigned 8-bit Asymmetric signed 8-bit Symmetric signed 8-bit Symmetric signed 16-bit Symmetric signed 16-bit activation + Symmetric signed 8-bit weight 16-bit floating point (FP16) 32-bit floating point Converted to FP16 if relax-FP32 is enabled
Per Channel Quantization	Only the following OPs support per channel quantization: CONV_2D DEPTHWISE_CONV_2D TRANSPOSE_CONV FULLY_CONNECTED MTK_TRANSPOSE_CONV PRELU
MDLA Hardware Buffer	MDLA has different internal buffers for different uses. If there is not a buffer of sufficient size for an operation, then MDLA cannot run the operation and reports “Unsupported”. To avoid internal buffer constraints: Keep the input channel size small. For operations that have stride, such as convolution and pooling, keep the stride values small in both width and height. Keep filter size small in both width and height, especially for convolution-like operations.

Supported OPs Specification

OP Name	TFLite OP	NNAPI	Restrictions
Abs	ABS	ABS	None
AvgPooling	AVERAGE_POOL_2D	AVERAGE_POOL_2D	Only NHWC format is supported. Filter shape, stride, and padding attributes must meet the following conditions: If filter size is equal to input size (both H and W dimensions in output are equal to 1): For quantized types: The input_height * input_width must be in the range [1, 2^20]. For floating-point types: The input_height and input_width must satisfy one of the following constraints to avoid accuracy issues: input_height(input_width) must be less than or equal to S, where S = 64. input_height(input_width) must be factorable in the form of “2^a * 3^b * 5^c * 7^d * N”, where N is 1 or a prime number less than or equal to S. If filter size is not equal to input size: Filter shape height and width must be in the range [1, 8]. Stride height must be in the range [1, filter_height]. Stride width must be in the range [1, filter_width]. Top and bottom padding must be in the range [0, filter_height-1]. Left and right padding must be in the range [0, filter_width-1].
BatchToSpace	BATCH_TO_SPACE_ND	BATCH_TO_SPACE_ND	Only NHWC format is supported.
Concat	CONCATENATION	CONCATENATION	None
Conv2D	CONV_2D	CONV_2D	NHWC and NCHW formats are supported. Filter size If stride is not 1x1, filter height and width must be in the range [1, 25]. Otherwise, filter width must be in the range [1, 31]. Stride If dilation rate is equal to 1, stride height and width must be in {1, 2, 3, 4, 8}. Padding For 1x1 filter, there must be no padding. Otherwise, padding must be in the range [0, 15]. Dilation rate Dilation rate height must be {1, 2, 4, 8}. Dilation rate width must be in {1, 2, 4, 8}. There are no limitations if ncc-tflite option `--use-sw-dilated-conv` is enabled. This option applies a software solution for dilated convolution.
DepthwiseConv2D	DEPTHWISE_CONV_2D	DEPTHWISE_CONV_2D	Filter size Filter height and width must be in the range [1, 25]. Channel multiplier Channel multiplier must be in {1, 2, 4, 8, 16}. Otherwise, channel multiplier must be equal to output channel (i.e., input channel is 1). Other constraints are the same as CONV_2D
DepthToSpace	DEPTH_TO_SPACE	DEPTH_TO_SPACE	Only NHWC format is supported. Input batch must be 1. Output batch must be 1.
Dequantize	DEQUANTIZE	DEQUANTIZE	Input cannot be per channel quantization.
ElementWiseAdd	ADD	ADD	See Limitations of Broadcasting.
ElementWiseDiv	DIV	DIV	We recommend not applying this operation for quantized types because of accuracy issues. See Limitations of Broadcasting.
ElementWiseMul	MUL	MUL	See Limitations of Broadcasting.
ElementWiseSub	SUB	SUB	See Limitations of Broadcasting. The scale of input1 (minuend) must be greater than or equal to the scale of input2 (subtrahend).
Elu	ELU	ELU	None
FullyConnected	FULLY_CONNECTED	FULLY_CONNECTED	Filter input channel (i.e. the second dimension of filter) must be in the range [1, 1048575]. FULLY_CONNECTED with dynamic weight is converted to CONV_2D. Bias must be a constant tensor.
HardSwish	HARD_SWISH	HARD_SWISH	None
L2Pooling	L2_POOL_2D	L2_POOL_2D	Same as AVERAGE_POOL. Except if filter size is equal to input size (both H and W dimensions in output are equal to 1), then filter_height * filter_width must be in the range [1, 2^10]. Input activation with floating point data type is unsupported.
MaxPooling	MAX_POOL_2D	MAX_POOL_2D	Same as AVERAGE_POOL_2D. Additional supported: input dimension is equal to output dimension, padding SAME, stride 1
Maximum	MAXIMUM	MAXIMUM	See Limitations of Broadcasting.
Mean	MEAN	MEAN	None
Minimum	MINIMUM	MINIMUM	See Limitations of Broadcasting.
MirrorPad	MIRRORPAD	MIRRORPAD	Supported tensors: 4-D with padding on height or width direction.
Neg	NEG	NEG	None
Pack	PACK		Cannot pack at last dimension.
Pad	PAD PADV2	PAD PAD_V2	None
Pow	POW	POW	Exponent must be a constant integer.
PRelu	PRELU	PRELU	Alpha must be a constant. Alpha must be a scalar (0-D) or 1-D tensor.
QLSTM (5 inputs)	LSTM	QUANTIZED_16BIT_LSTM	The last dimension of input + the last dimension of output scratch must be: 16-aligned In the range [1, 1048575]
Quantize	QUANTIZE	QUANTIZE	None
ReduceMax	REDUCE_MAX	REDUCE_MAX	The size before reduced axis must be less than 65536.
ReduceMin	REDUCE_MIN	REDUCE_MIN	The size before reduced axis must be less than 65536.
ReLU ReLU1 ReLU6	RELU RELU_N1_TO_1 RELU6	RELU RELU1 RELU6	None
Reshape	RESHAPE	RESHAPE	None
Resize::BILINEAR	RESIZE_BILINEAR	RESIZE_BILINEAR	Only NHWC format is supported. Input Height must be in the range [1, 8192]. Input Width must be in the range [1, 8192].
Resize::NEAREST	RESIZE_NEAREST_NEIGHBOR	RESIZE_NEAREST_NEIGHBOR	Only NHWC format is supported. Input Height must be in the range [1, 8192]. Input Width must be in the range [1, 8192].
RSqrt	RSQRT	RSQRT	None
Sigmoid	LOGISTIC	LOGISTIC	None
Slice	SLICE	SLICE	None
SoftMax	SOFTMAX	SOFTMAX	Axis must be -1; this means only the input channel is normalized. Quantized types are dequantized to FP16 due to an accuracy issue.
SpaceToBatch	SPACE_TO_BATCH_ND	SPACE_TO_BATCH_ND	Only NHWC format is supported. Input batch must be 1.
SpaceToDepth	SPACE_TO_DEPTH	SPACE_TO_DEPTH	Only NHWC format is supported. Input batch must be 1.
Split	SPLIT	SPLIT	None
Sqrt	SQRT	SQRT	None
Square	SQUARE		None
SquaredDifference	SQUARED_DIFFERENCE		None
StridedSlice	STRIDED_SLICE	STRIDED_SLICE	Stride on the last dimension is unsupported.
Sum	SUM	SUM	None
Tanh	TANH	TANH	For quantized types, InputScale/OutputScale must be less than 842.
Transpose	TRANSPOSE	TRANSPOSE	None
TransposeConv2D	TRANSPOSE_CONV	TRANSPOSE_CONV_2D	Weight must be a constant tensor Filter size Filter height and width must be in the range [1, 25]. Stride Stride height must be less than or equal to filter height. Stride width must be less than or equal to filter width. Other constraints are the same as CONV_2D
Unpack	UNPACK		Cannot unpack at last dimension.

Limitations of Broadcasting

Only broadcasting from a small tensor to a large tensor with compatible dimensions is supported.
- Example 1: Input1 broadcasting to Input2 is supported.
- Example 2: Input2 broadcasting to Input1 is supported.
- Example 3: Input1 and Input2 broadcasting to each other is unsupported.

Hardware broadcasting is supported if either of the following conditions are met:
1. The small tensor has one of the following shapes:
  - []
  - [1]
  - [C]
  - [1, C]
  - [1, 1, C]
  - [1, 1, 1, C]
2. The small tensor is broadcast on the batch or channel dimension.
  - Example 1: The shape of the small tensor is [1,H,W,C], where H,W,C are not equal to 1.
  - Example 2: The shape of the small tensor is [N,H,W,1], where N,H,W are not equal to 1.
  - Example 3: The shape of the small tensor is [1,H,W,1], where H,W are not equal to 1.

If the conditions for hardware broadcasting are not met, broadcasting is processed by software using multiple SPLIT and CONCAT.
- If the small tensor is constant, the broadcasting is done at compile time. Bandwidth requirements might be larger at runtime.
- If the small tensor is not constant, there are extra runtime DMA overheads.