MDLA 2.0 Guidelines

Note

The following limitations may not be equal to MDLA hardware constraints. This is because Neuron might have software workarounds for MDLA hardware, or limitations due to the current software implementation.

General Restrictions

Category

Restrictions

Tensor

Rank Only support 0-D, 1-D, 2-D, 3-D, 4-D tensors.

Batch Size (N)

Batch size should be in range [1, 255], except Conv2D, DepthwiseConv2D and FullyConnected.

Height Size (H)

Should be in range [1, 65536) for both input and output activations.

Width Size (W)

Should be in range [1, 65536) for both input and output activations.

Channel Size (C)

Should be in range [1, 65536) for both input and output activations.

Data Type

Only the following data types are supported:
  1. Asymmetric unsigned 8-bit.

  2. Asymmetric signed 8-bit.

  3. Symmetric signed 8-bit.

  4. Symmetric signed 16-bit.

  5. 16-bit floating point (FP16).

  6. 32-bit floating point (FP32).

    • Converted to FP16 if relax-FP32 is enabled.

Per Channel Quantization

The following operations support per channel quantization:
  1. Conv2D

  2. DepthwiseConv2D

  3. TransposeConv2D

  4. FullyConnected

  5. PRelu

Data Format

Only NHWC format is supported.

MDLA Hardware Buffer

MDLA has different internal buffers for different usages. If there is no sufficient buffer for the given operation, this operation cannot be supported by MDLA. Here is a guideline to avoid the internal buffer constraint for MDLA:
  1. Keep input channel size small.

  2. Keep stride values (in both width and height) small for operations that have stride (e.g., convolution and pooling).

  3. Keep filter size (in both width and height) small, especially for the convolution-like operations.

Supported OPs Specification

OP Name

TFLite OP

Restrictions

Abs

ABS

None

AvgPooling

AVERAGE_POOL_2D

  1. If filter shape is 1x1

    1. There should be no padding.

    2. Stride should not be 0.

  2. If this is a global pooling

    1. The filter_height x filter_width should be in range [1, 2^18].

  3. Otherwise

    1. The height and width of filter shape should be in range [1, 8].

    2. The stride height should be in range [1, filter_height].

    3. The stride width should be in range [1, filter_width].

    4. The padding should be in range [0, 15].

BatchToSpace

BATCH_TO_SPACE_ND

None

Concat

CONCATENATION

None

Conv2D

CONV_2D

  1. Input channel size

    1. For 8-bit data types, the input channel size should be in range [1, 8194].

    2. For 16-bit data types, the input channel size should be in range [1, 4095].

    3. Input channel should be equal to filter channel.

      1. Group Conv2d is not supported (i.e., groups > 1)

  2. Filter size

    1. Filter height should be in range [1, 16].

    2. Filter width should be in range [1, 16].

  3. Stride

    1. If the height and width of dilation rate is not equal to 1: stride height and width should be 1.

    2. Otherwise: stride height and width should be in {1, 2, 3, 4, 8}.

  4. Padding

    1. For 1x1 filter, there should be no padding.

    2. Otherwise, padding should be in range [0, 15].

  5. Dilation rate

    1. The height of dilation rate should be in {1, 2, 4, 8}.

    2. The width of dilation rate should be in {1, 2, 4, 8}.

  6. Dynamic Weight

    1. The output channel of filer should be 16-aligned.

    2. The input channel of filer should be 32-bytes aligned.

DepthwiseConv2D

DEPTHWISE_CONV_2D

  1. Input channel size

    1. For 8-bit data types, the input channel size should be in range [1, 8194].

    2. For 16-bit data types, the input channel size should be in range [1, 4095].

  2. Filter size

    1. Filter height should be in range [1, 8].

    2. Filter width should be in range [1, 8].

  3. Stride

    1. Stride height should be less than or equal to filter height and should be in {1, 2, 3, 4}.

    2. Stride width should be less than or equal to filter width and should be in {1, 2, 3, 4}.

  4. Padding should be in range [0, 15].

  5. Dilation rate

    1. The height of dilation rate should be in {1, 2, 4, 8}.

    2. The width of dilation rate should be in {1, 2, 4, 8}.

  6. Channel multiplier

    1. Should be in range [1, 255]

    2. If channel multiplier > 1

      1. Channel multiplier should be 16-aligned (i.e., 16, 32, 48, 64, …)

  7. Dynamic weight

    1. The channel of filter should be 32-bytes aligned.

    2. Cannot support -num-mdla=2 or more if enable dynamic weight. (bit-true issues).

DepthToSpace

DEPTH_TO_SPACE

Input and output batch must be 1.

Dequantize

DEQUANTIZE

Input cannot be per channel quantization.

ElementWiseAdd

ADD

Hardware doesn’t support broadcasting, except input-1 or input-2 is a 0-D or 1-D constant.

  • For other constant broadcasting cases, broadcasting is supported by software with compile time constant enlarge.

  • For other cases, broadcasting is supported by software using multiple concat operations.

ElementWiseDiv

DIV

Not yet support broadcasting.

ElementWiseMul

MUL

Hardware doesn’t support broadcasting, except input-1 or input-2 is a 0-D or 1-D constant.

  • For other constant broadcasting cases, broadcasting is supported by software with compile time constant enlarge.

  • For other cases, broadcasting is supported by software using multiple concat operations.

ElementWiseSub

SUB

  1. The scale of input1 (minuend) should be greater than or equal to the scale of input2 (subtrahend).

  2. Broadcasting is supported by software using multiple concat operations.

Elu

ELU

None

FullyConnected

FULLY_CONNECTED

  1. Input channel (or the last dimension of input)

    1. Should be 16-aligned.

    2. Or equal to the filter input channel.

  2. Filter input channel (i.e., the second dimension of filter)

    1. Should be 16-aligned or equal to the input channel size.

    2. Should be in range [1, 1048576).

  3. Dynamic Weight

    1. The output channel of filer should be 16-aligned.

    2. The input channel of filer should be 32-bytes aligned.

HardSwish

HARD_SWISH

For quantized model, it must concurrently meet conditions as following to keep precision.

  • TYPE is uint8, MIN(uint8)=0 MAX(uint8)=255

  • TYPE is int8, MIN(int8)=-128 MAX(int8)=127

    1. (input_offset - ROUND(3.0 / input_scale)) >= MIN(TYPE)

    2. ABS(6.0 - ROUND(6.0 / input_scale) * input_scale) <= 2 * (6.0 / (MAX(TYPE) - MIN(TYPE))

L2Pooling

L2_POOL_2D

  1. Filter shape 1x1 is unsupported.

  2. If this is a global pooling

    1. The filter_height x filter_width should be in range [1, 2^10].

  3. Otherwise

    1. The height and width of filter shape should be in range [1, 8].

    2. The stride height should be in range [1, filter_height].

    3. The stride width should be in range [1, filter_width].

    4. The padding should be in range [0, 15].

  4. Data type

    1. Floating point is unsupported.

MaxPooling

MAX_POOL_2D

  1. If filter shape is 1x1

    1. There should be no padding.

    2. Stride should not be 0.

  2. If this is a global pooling

    1. The filter_height x filter_width should be in range [1, 2^18].

  3. Otherwise

    1. The height and width of filter shape should be in range [1, 8].

    2. The stride height should be in range [1, filter_height].

    3. The stride width should be in range [1, filter_width].

    4. The padding should be in range [0, 15].

Maximum

MAXIMUM

Broadcasting is supported by software using multiple concat operations.

Mean

MEAN

  1. Axis should be height (H) and width (W) dimensions.

  2. The height and width of output shape should be 1.

  3. The input_height x input_width should be in range [1, 2^18].

  4. For floating point types, the input_height and input_width must satisfy one of the following constraints to avoid accuracy issue:

    1. input_height(input_width) must be less than or equal to S, where S = 64,

    2. input_height(input_width) must be factorable in the form of “2^a * 3^b * 5^c * 7^d * N”, where N is 1 or a prime number less than or equal to S.

Minimum

MINIIMUM

Broadcasting is supported by software using multiple concat operations.

Neg

NEG

None

Pack

PACK

Can not pack at last dimension.

Pad

PAD
PADV2

  1. For quantized types, input and output activations should have the same zero-point and scale.

Pow

POW

  1. Exponent should be constant

  2. Exponent should be equal to 2.f.

PRelu

PRELU

  1. Alpha should be a scalar or 1-D constant tensor.

  2. The data types of input and output should be the same.

  3. LeakyRelu case is included.

QLSTM (5 inputs)

LSTM

  1. Bias scale should be smaller than 2^-10.

  2. The last dimension of input + the last dimension of output scratch should be

    1. 16-aligned

    2. in range [1, 1048576)

Quantize

QUANTIZE

None

ReLU
ReLU1
ReLU6

RELU
RELU_N1_TO_1
RELU6

None

Reshape

RESHAPE

None

Resize::BILINEAR

RESIZE_BILINEAR

  1. Input Height should be less than or equal to 8192.

  2. Input Width should be less than or equal to 8192.

  3. half_pixel_centers must be false.

Resize::NEAREST

RESIZE_NEAREST_NEIGHBOR

  1. Input Height should be less than or equal to 8192.

  2. Input Width should be less than or equal to 8192.

  3. half_pixel_centers must be false.

RSqrt

RSQRT

NONE

Sigmoid

LOGISTIC

None

Slice

SLICE

None

SoftMax

SOFTMAX

Axis should be -1.

SpaceToBatch

SPACE_TO_BATCH_ND

Input batch must be 1.

SpaceToDepth

SPACE_TO_DEPTH

Input batch must be 1.

Split

SPLIT

None

Sqrt

SQRT

NONE

Square

SQUARE

NONE

SquaredDifference

SQUARED_DIFFERENCE

None

StridedSlice

STRIDED_SLICE

  1. Stride should be greater than or equal to 1.

  2. Stride on the last dimension is unsupported.

Tanh

TANH

For quantized types, InputScale/OutputScale should < 842.

Transpose

TRANSPOSE

Supports transpose among H, W, C dimensions.

TransposeConv2D

TRANSPOSE_CONV

  1. Input channel size

    1. For 8-bit data types, the input channel size should be in range [1, 8194].

    2. For 16-bit data types, the input channel size should be in range [1, 4095].

  2. Filter size

    1. Filter height should be in range [1, 16].

    2. Filter width should be in range [1, 16].

  3. Stride

    1. Stride height should be less than or equal to filter height and should be in {1, 2, 3, 4, 8}.

    2. Stride width should be less than or equal to filter width and should be in {1, 2, 3, 4, 8}.

  4. Padding should be in range [0, 15].

Unpack

UNPACK

Can not unpack at last dimension.