TFLite(LiteRT) - Generative AI on MT8883

This page lists the generative AI models and representative performance data for MT8883 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.

Model Support and Performance

Note

The following symbols are used in the tables below:

  • - : To be released.

  • Not Support : Platform does not support this model.

Note

For VLM, Qwen3-vl is to be released by the end of June,2026.

The following tables summarize the supported generative models and measured performance on MT8883.

Large Language Models (LLMs)

Model

Prompt Mode (tok/s)

Generative Mode (tok/s)

DeepSeek-R1-Distill-Qwen-1.5B

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Llama-8B

Qwen3-1.7B

834.02

25.18

Qwen2.5-1.5B-Instruct

763.66

20.16

Qwen2.5-3B-Instruct

502.05

19.32

Qwen2.5-7B-Instruct

184.73

6.89

gemma3-1B (Text Only)

1125.16

38.87

gemma3-4B (Text-Only)

gemma2-2b-it

llama3.2-1B-Instruct

1372.57

43.47

llama3.2-3B-Instruct

611.76

19.89

llama3-8b

128.36

6.53

MiniCPM-2B-sft-bf16-llama-format

llava1.5-7b-speculative-decoding

138.76

4.46

medusa_v1_0_vicuna_7b_v1.5

vicuna1.5-7b-tree-speculative-decoding-plus

baichuan-7b-int8-cache

Stable Diffusion and Image Generation

Model

Main Time (ms)

Inference Time (ms)

Stable Diffusion v.1.5 controlnet

Stable Diffusion v2.1 base model with controlnet

CLIP and Embedding Models

Model

Main Time (ms)

Inference Time (ms)

CLIP Image Encoder

img_encoder_proj_clip_vit_large_dynamic

img_encoder_proj_openclip_vit_big_g_dynamic

img_encoder_proj_openclip_vit_h_dynamic

CLIP Text Encoder

text_encoder_clip_vit_large

text_encoder_openclip_vit_h