MDLA 3.0 Guidelines
Note
The following limitations may not be equal to MDLA hardware constraints. This is because Neuron might have software workarounds for MDLA hardware, or limitations due to the current software implementation.
General Restrictions
Category |
Limitations |
---|---|
Tensor Rank |
Supported tensor ranks:
|
Batch Size (N) |
Valid batch sizes:
|
Height Size (H) |
Valid range for input and output activations: [1, 65535] |
Width Size (W) |
Valid range for input and output activations: [1, 65535] |
Channel Size (C) |
Valid range for input and output activations: [1, 65535] |
Data Type |
Supported data types:
|
Per Channel Quantization |
Only the following OPs support per channel quantization:
|
MDLA Hardware Buffer |
MDLA has different internal buffers for different uses. If there is not a buffer of sufficient size for an operation, then MDLA cannot run the operation and reports “Unsupported”. To avoid internal buffer constraints:
|
Supported OPs Specification
OP Name |
TFLite OP |
NNAPI |
Restrictions |
---|---|---|---|
Abs |
ABS |
ABS |
None |
AvgPooling |
AVERAGE_POOL_2D |
AVERAGE_POOL_2D |
|
BatchToSpace |
BATCH_TO_SPACE_ND |
BATCH_TO_SPACE_ND |
Only NHWC format is supported. |
Concat |
CONCATENATION |
CONCATENATION |
None |
Conv2D |
CONV_2D |
CONV_2D |
|
DepthwiseConv2D |
DEPTHWISE_CONV_2D |
DEPTHWISE_CONV_2D |
|
DepthToSpace |
DEPTH_TO_SPACE |
DEPTH_TO_SPACE |
|
Dequantize |
DEQUANTIZE |
DEQUANTIZE |
Input cannot be per channel quantization. |
ElementWiseAdd |
ADD |
ADD |
|
ElementWiseDiv |
DIV |
DIV |
|
ElementWiseMul |
MUL |
MUL |
|
ElementWiseSub |
SUB |
SUB |
|
Elu |
ELU |
ELU |
None |
FullyConnected |
FULLY_CONNECTED |
FULLY_CONNECTED |
|
HardSwish |
HARD_SWISH |
HARD_SWISH |
None |
L2Pooling |
L2_POOL_2D |
L2_POOL_2D |
|
MaxPooling |
MAX_POOL_2D |
MAX_POOL_2D |
|
Maximum |
MAXIMUM |
MAXIMUM |
|
Mean |
MEAN |
MEAN |
None |
Minimum |
MINIMUM |
MINIMUM |
|
MirrorPad |
MIRRORPAD |
MIRRORPAD |
Supported tensors: 4-D with padding on height or width direction. |
Neg |
NEG |
NEG |
None |
Pack |
PACK |
Cannot pack at last dimension. |
|
Pad |
PAD |
PAD |
None |
Pow |
POW |
POW |
Exponent must be a constant integer. |
PRelu |
PRELU |
PRELU |
|
QLSTM (5 inputs) |
LSTM |
QUANTIZED_16BIT_LSTM |
The last dimension of input + the last dimension of output scratch must be:
|
Quantize |
QUANTIZE |
QUANTIZE |
None |
ReduceMax |
REDUCE_MAX |
REDUCE_MAX |
The size before reduced axis must be less than 65536. |
ReduceMin |
REDUCE_MIN |
REDUCE_MIN |
The size before reduced axis must be less than 65536. |
ReLU |
RELU |
RELU |
None |
Reshape |
RESHAPE |
RESHAPE |
None |
Resize::BILINEAR |
RESIZE_BILINEAR |
RESIZE_BILINEAR |
|
Resize::NEAREST |
RESIZE_NEAREST_NEIGHBOR |
RESIZE_NEAREST_NEIGHBOR |
|
RSqrt |
RSQRT |
RSQRT |
None |
Sigmoid |
LOGISTIC |
LOGISTIC |
None |
Slice |
SLICE |
SLICE |
None |
SoftMax |
SOFTMAX |
SOFTMAX |
|
SpaceToBatch |
SPACE_TO_BATCH_ND |
SPACE_TO_BATCH_ND |
|
SpaceToDepth |
SPACE_TO_DEPTH |
SPACE_TO_DEPTH |
|
Split |
SPLIT |
SPLIT |
None |
Sqrt |
SQRT |
SQRT |
None |
Square |
SQUARE |
None |
|
SquaredDifference |
SQUARED_DIFFERENCE |
None |
|
StridedSlice |
STRIDED_SLICE |
STRIDED_SLICE |
Stride on the last dimension is unsupported. |
Sum |
SUM |
SUM |
None |
Tanh |
TANH |
TANH |
For quantized types, InputScale/OutputScale must be less than 842. |
Transpose |
TRANSPOSE |
TRANSPOSE |
None |
TransposeConv2D |
TRANSPOSE_CONV |
TRANSPOSE_CONV_2D |
|
Unpack |
UNPACK |
Cannot unpack at last dimension. |
Limitations of Broadcasting
Only broadcasting from a small tensor to a large tensor with compatible dimensions is supported.
Example 1: Input1 broadcasting to Input2 is supported.
Example 2: Input2 broadcasting to Input1 is supported.
Example 3: Input1 and Input2 broadcasting to each other is unsupported.
Hardware broadcasting is supported if either of the following conditions are met:
The small tensor has one of the following shapes:
[]
[1]
[C]
[1, C]
[1, 1, C]
[1, 1, 1, C]
The small tensor is broadcast on the batch or channel dimension.
Example 1: The shape of the small tensor is [1,H,W,C], where H,W,C are not equal to 1.
Example 2: The shape of the small tensor is [N,H,W,1], where N,H,W are not equal to 1.
Example 3: The shape of the small tensor is [1,H,W,1], where H,W are not equal to 1.
If the conditions for hardware broadcasting are not met, broadcasting is processed by software using multiple SPLIT and CONCAT.
If the small tensor is constant, the broadcasting is done at compile time. Bandwidth requirements might be larger at runtime.
If the small tensor is not constant, there are extra runtime DMA overheads.