TFLite(LiteRT) - Generative AI on MT8893

This page lists the generative AI models and representative performance data for MT8893 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.

Model Support and Performance

Note

For VLM, Qwen3-vl is to be released by the end of June,2026.

The following tables summarize the supported generative models and measured performance on MT8893.

Large Language Models (LLMs)
Model	Prompt Mode (tok/s)	Generative Mode (tok/s)
DeepSeek-R1-Distill-Qwen-1.5B	1057.25	25.68
DeepSeek-R1-Distill-Qwen-7B	448.17	11.69
DeepSeek-R1-Distill-Llama-8B	425.79	11.36
Qwen3-1.7B	1069.16	23.42
Qwen2.5-1.5B-Instruct	1621.85	38.57
Qwen2.5-3B-Instruct	751.06	20.87
Qwen2.5-7B-Instruct	471.95	11.74
gemma2-2b-it	891.00	21.37
llama3.2-1B-Instruct	2093.61	61.14
llama3.2-3B-Instruct	1022.95	25.05
llama3-8b	426.13	11.51
MiniCPM-2B-sft-bf16-llama-format	886.72	22.28
llava1.5-7b-speculative-decoding	267.98	6.78
medusa_v1_0_vicuna_7b_v1.5	501.05	22.79
vicuna1.5-7b-tree-speculative-decoding-plus	454.58	22.72
baichuan-7b-int8-cache	561.76	11.37

Stable Diffusion and Image Generation
Model	Main Time (ms)	Inference Time (ms)
Stable Diffusion v.1.5 controlnet	9395	8035
Stable Diffusion v2.1 base model with controlnet	6969	5451

CLIP and Embedding Models
Model	Main Time (ms)	Inference Time (ms)
CLIP Image Encoder
img_encoder_proj_clip_vit_large_dynamic	358.61	51.14
img_encoder_proj_openclip_vit_big_g_dynamic	1390.56	517.13
img_encoder_proj_openclip_vit_h_dynamic	591.93	147.47
CLIP Text Encoder
text_encoder_clip_vit_large	308.72	18.94
text_encoder_openclip_vit_h	510.92	48.49