MDLA 3.0 Guidelines

Note

The following limitations may not be equal to MDLA hardware constraints. This is because Neuron might have software workarounds for MDLA hardware, or limitations due to the current software implementation.

General Restrictions

Category

Limitations

Tensor Rank

Supported tensor ranks:

  • For operation (OP) type Conv3D, AvgPool3D, L2Pool3D, MinPool3D, and MaxPool3D: 5-D

  • For all other OP types: 0-D, 1-D, 2-D, 3-D, 4-D

Batch Size (N)

Valid batch sizes:

  • FULLY_CONNECTED: {1, 2, 4, 8}. FULLY_CONNECTED with any other batch size is converted to OP CONV_2D.

  • CONV_2D, DEPTHWISE_CONV_2D, TRANSPOSE_CONV: No batch size limit. If batch size is 65535 or less, the OP is split into multiple OPs.

  • All other OPs: [1, 65535]

Height Size (H)

Valid range for input and output activations: [1, 65535]

Width Size (W)

Valid range for input and output activations: [1, 65535]

Channel Size (C)

Valid range for input and output activations: [1, 65535]

Data Type

Supported data types:

  • Asymmetric unsigned 8-bit

  • Asymmetric signed 8-bit

  • Symmetric signed 8-bit

  • Symmetric signed 16-bit

  • Symmetric signed 16-bit activation + Symmetric signed 8-bit weight

  • 16-bit floating point (FP16)

  • 32-bit floating point

    • Converted to FP16 if relax-FP32 is enabled

Per Channel Quantization

Only the following OPs support per channel quantization:

  • CONV_2D

  • DEPTHWISE_CONV_2D

  • TRANSPOSE_CONV

  • FULLY_CONNECTED

  • MTK_TRANSPOSE_CONV

  • PRELU

MDLA Hardware Buffer

MDLA has different internal buffers for different uses. If there is not a buffer of sufficient size for an operation, then MDLA cannot run the operation and reports “Unsupported”. To avoid internal buffer constraints:

  • Keep the input channel size small.

  • For operations that have stride, such as convolution and pooling, keep the stride values small in both width and height.

  • Keep filter size small in both width and height, especially for convolution-like operations.

Supported OPs Specification

OP Name

TFLite OP

NNAPI

Restrictions

Abs

ABS

ABS

None

AvgPooling

AVERAGE_POOL_2D

AVERAGE_POOL_2D

  • Only NHWC format is supported.

  • Filter shape, stride, and paddings attributes must meet the following conditions:

    • If filter size is equal to input size (both H and W dimensions in output are equal to 1):

      • For quantized types: The input_height * input_width must be in the range [1, 2^20].

      • For floating-point types: The input_height and input_width must satisfy one of the following constraints to avoid accuracy issues:

        • input_height(input_width) must be less than or equal to S, where S = 64.

        • input_height(input_width) must be factorable in the form of “2^a * 3^b * 5^c * 7^d * N”, where N is 1 or a prime number less than or equal to S.

    • If filter size is not equal to input size:

      • Filter shape height and width must be in the range [1, 8].

      • Stride height must be in the range [1, filter_height].

      • Stride width must be in the range [1, filter_width].

      • Top and bottom paddings must be in the range [0, filter_height-1].

      • Left and right paddings must be in the range [0, filter_width-1].

BatchToSpace

BATCH_TO_SPACE_ND

BATCH_TO_SPACE_ND

Only NHWC format is supported.

Concat

CONCATENATION

CONCATENATION

None

Conv2D

CONV_2D

CONV_2D

  • NHWC and NCHW formats are supported.

  • Filter size

    • If stride is not 1x1, filter height and width must be in the range [1, 25].

    • Otherwise, filter width must be in the range [1, 31].

  • Stride

    • If dilation rate is equal to 1, stride height and width must be in {1, 2, 3, 4, 8}.

  • Padding

    • For 1x1 filter, there must be no padding.

    • Otherwise, padding must be in the range [0, 15].

  • Dilation rate

    • Dilation rate height must be {1, 2, 4, 8}.

    • Dilation rate width must be in {1, 2, 4, 8}.

    • There are no limitations if ncc-tflite option “–use-sw-dilated-conv” is enabled. This option applies a software solution for dilated convolution.

DepthwiseConv2D

DEPTHWISE_CONV_2D

DEPTHWISE_CONV_2D

  • Filter size

    • Filter height and width must be in the range [1, 25].

  • Channel multiplier

    • Channel multiplier must be in {1, 2, 4, 8, 16}.

    • Otherwise, channel multiplier must be equal to output channel (i.e., input channel is 1).

  • Other constraints are the same as CONV_2D

DepthToSpace

DEPTH_TO_SPACE

DEPTH_TO_SPACE

  • Only NHWC format is supported.

  • Input batch must be 1.

  • Output batch must be 1.

Dequantize

DEQUANTIZE

DEQUANTIZE

Input cannot be per channel quantization.

ElementWiseAdd

ADD

ADD

See Limitations of Broadcasting.

ElementWiseDiv

DIV

DIV

ElementWiseMul

MUL

MUL

See Limitations of Broadcasting.

ElementWiseSub

SUB

SUB

Elu

ELU

ELU

None

FullyConnected

FULLY_CONNECTED

FULLY_CONNECTED

  • Filter input channel (i.e. the 2nd dimension of filter) must be in the range [1, 1048575].

  • FULLY_CONNECTED with dynamic weight is converted to CONV_2D.

  • Bias must be a constant tensor.

HardSwish

HARD_SWISH

HARD_SWISH

None

L2Pooling

L2_POOL_2D

L2_POOL_2D

  • Same as AVERAGE_POOL. Except if filter size is equal to input size (both H and W dimensions in output are equal to 1), then filter_height * filter_width must be in the range [1, 2^10].

  • Input activation with floating point data type is unsupported.

MaxPooling

MAX_POOL_2D

MAX_POOL_2D

  • Same as AVERAGE_POOL_2D.

  • Additional supported: input dimension is equal to output dimension, padding SAME, stride 1

Maximum

MAXIMUM

MAXIMUM

See Limitations of Broadcasting.

Mean

MEAN

MEAN

None

Minimum

MINIMUM

MINIMUM

See Limitations of Broadcasting.

MirrorPad

MIRRORPAD

MIRRORPAD

Supported tensors: 4-D with padding on height or width direction.

Neg

NEG

NEG

None

Pack

PACK

Cannot pack at last dimension.

Pad

PAD
PADV2

PAD
PAD_V2

None

Pow

POW

POW

Exponent must be a constant integer.

PRelu

PRELU

PRELU

  • Alpha must be a constant.

  • Alpha must be a scalar (0-D) or 1-D tensor.

QLSTM (5 inputs)

LSTM

QUANTIZED_16BIT_LSTM

The last dimension of input + the last dimension of output scratch must be:

  • 16-aligned

  • In the range [1, 1048575]

Quantize

QUANTIZE

QUANTIZE

None

ReduceMax

REDUCE_MAX

REDUCE_MAX

The size before reduced axis must be less than 65536.

ReduceMin

REDUCE_MIN

REDUCE_MIN

The size before reduced axis must be less than 65536.

ReLU
ReLU1
ReLU6

RELU
RELU_N1_TO_1
RELU6

RELU
RELU1
RELU6

None

Reshape

RESHAPE

RESHAPE

None

Resize::BILINEAR

RESIZE_BILINEAR

RESIZE_BILINEAR

  • Only NHWC format is supported.

  • Input Height must be in the range [1, 8192].

  • Input Width must be in the range [1, 8192].

Resize::NEAREST

RESIZE_NEAREST_NEIGHBOR

RESIZE_NEAREST_NEIGHBOR

  • Only NHWC format is supported.

  • Input Height must be in the range [1, 8192].

  • Input Width must be in the range [1, 8192].

RSqrt

RSQRT

RSQRT

None

Sigmoid

LOGISTIC

LOGISTIC

None

Slice

SLICE

SLICE

None

SoftMax

SOFTMAX

SOFTMAX

  • Axis must be -1; this means only the input channel is normalized.

  • Quantized types are dequantized to FP16 due to an accuracy issue.

SpaceToBatch

SPACE_TO_BATCH_ND

SPACE_TO_BATCH_ND

  • Only NHWC format is supported.

  • Input batch must be 1.

SpaceToDepth

SPACE_TO_DEPTH

SPACE_TO_DEPTH

  • Only NHWC format is supported.

  • Input batch must be 1.

Split

SPLIT

SPLIT

None

Sqrt

SQRT

SQRT

None

Square

SQUARE

None

SquaredDifference

SQUARED_DIFFERENCE

None

StridedSlice

STRIDED_SLICE

STRIDED_SLICE

Stride on the last dimension is unsupported.

Sum

SUM

SUM

None

Tanh

TANH

TANH

For quantized types, InputScale/OutputScale must be less than 842.

Transpose

TRANSPOSE

TRANSPOSE

None

TransposeConv2D

TRANSPOSE_CONV

TRANSPOSE_CONV_2D

  • Weight must be a constant tensor

  • Filter size

    • Filter height and width must be in the range [1, 25].

  • Stride

    • Stride height must be less than or equal to filter height.

    • Stride width must be less than or equal to filter width.

  • Other constraints are the same as CONV_2D

Unpack

UNPACK

Cannot unpack at last dimension.

Limitations of Broadcasting

  • Only broadcasting from a small tensor to a large tensor with compatible dimensions is supported.

    • Example 1: Input1 broadcasting to Input2 is supported.

    • Example 2: Input2 broadcasting to Input1 is supported.

    • Example 3: Input1 and Input2 broadcasting to each other is unsupported.

  • Hardware broadcasting is supported if either of the following conditions are met:

    1. The small tensor has one of the following shapes:

      • []

      • [1]

      • [C]

      • [1, C]

      • [1, 1, C]

      • [1, 1, 1, C]

    2. The small tensor is broadcast on the batch or channel dimension.

      • Example 1: The shape of the small tensor is [1,H,W,C], where H,W,C are not equal to 1.

      • Example 2: The shape of the small tensor is [N,H,W,1], where N,H,W are not equal to 1.

      • Example 3: The shape of the small tensor is [1,H,W,1], where H,W are not equal to 1.

  • If the conditions for hardware broadcasting are not met, broadcasting is processed by software using multiple SPLIT and CONCAT.

    • If the small tensor is constant, the broadcasting is done at compile time. Bandwidth requirements might be larger at runtime.

    • If the small tensor is not constant, there are extra runtime DMA overheads.