TFLite(LiteRT) - Generative AI on MT8883
This page lists the generative AI models and representative performance data for MT8883 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.
Model Support and Performance
Note
The following symbols are used in the tables below:
-: To be released.Not Support: Platform does not support this model.
Note
For VLM, Qwen3-vl is to be released by the end of June,2026.
The following tables summarize the supported generative models and measured performance on MT8883.
Model |
Prompt Mode (tok/s) |
Generative Mode (tok/s) |
DeepSeek-R1-Distill-Qwen-1.5B |
– |
– |
DeepSeek-R1-Distill-Qwen-7B |
– |
– |
DeepSeek-R1-Distill-Llama-8B |
– |
– |
Qwen3-1.7B |
834.02 |
25.18 |
Qwen2.5-1.5B-Instruct |
763.66 |
20.16 |
Qwen2.5-3B-Instruct |
502.05 |
19.32 |
Qwen2.5-7B-Instruct |
184.73 |
6.89 |
gemma3-1B (Text Only) |
1125.16 |
38.87 |
gemma3-4B (Text-Only) |
– |
– |
gemma2-2b-it |
– |
– |
llama3.2-1B-Instruct |
1372.57 |
43.47 |
llama3.2-3B-Instruct |
611.76 |
19.89 |
llama3-8b |
128.36 |
6.53 |
MiniCPM-2B-sft-bf16-llama-format |
– |
– |
llava1.5-7b-speculative-decoding |
138.76 |
4.46 |
medusa_v1_0_vicuna_7b_v1.5 |
– |
– |
vicuna1.5-7b-tree-speculative-decoding-plus |
– |
– |
baichuan-7b-int8-cache |
– |
– |
Model |
Main Time (ms) |
Inference Time (ms) |
Stable Diffusion v.1.5 controlnet |
– |
– |
Stable Diffusion v2.1 base model with controlnet |
– |
– |
Model |
Main Time (ms) |
Inference Time (ms) |
CLIP Image Encoder |
||
img_encoder_proj_clip_vit_large_dynamic |
– |
– |
img_encoder_proj_openclip_vit_big_g_dynamic |
– |
– |
img_encoder_proj_openclip_vit_h_dynamic |
– |
– |
CLIP Text Encoder |
||
text_encoder_clip_vit_large |
– |
– |
text_encoder_openclip_vit_h |
– |
– |