TFLite(LiteRT) - Generative AI on MT8893
This page lists the generative AI models and representative performance data for MT8893 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.
Model Support and Performance
Note
For VLM, Qwen3-vl is to be released by the end of June,2026.
The following tables summarize the supported generative models and measured performance on MT8893.
Model |
Prompt Mode (tok/s) |
Generative Mode (tok/s) |
DeepSeek-R1-Distill-Qwen-1.5B |
1057.25 |
25.68 |
DeepSeek-R1-Distill-Qwen-7B |
448.17 |
11.69 |
DeepSeek-R1-Distill-Llama-8B |
425.79 |
11.36 |
Qwen3-1.7B |
1069.16 |
23.42 |
Qwen2.5-1.5B-Instruct |
1621.85 |
38.57 |
Qwen2.5-3B-Instruct |
751.06 |
20.87 |
Qwen2.5-7B-Instruct |
471.95 |
11.74 |
gemma2-2b-it |
891.00 |
21.37 |
llama3.2-1B-Instruct |
2093.61 |
61.14 |
llama3.2-3B-Instruct |
1022.95 |
25.05 |
llama3-8b |
426.13 |
11.51 |
MiniCPM-2B-sft-bf16-llama-format |
886.72 |
22.28 |
llava1.5-7b-speculative-decoding |
267.98 |
6.78 |
medusa_v1_0_vicuna_7b_v1.5 |
501.05 |
22.79 |
vicuna1.5-7b-tree-speculative-decoding-plus |
454.58 |
22.72 |
baichuan-7b-int8-cache |
561.76 |
11.37 |
Model |
Main Time (ms) |
Inference Time (ms) |
Stable Diffusion v.1.5 controlnet |
9395 |
8035 |
Stable Diffusion v2.1 base model with controlnet |
6969 |
5451 |
Model |
Main Time (ms) |
Inference Time (ms) |
CLIP Image Encoder |
||
img_encoder_proj_clip_vit_large_dynamic |
358.61 |
51.14 |
img_encoder_proj_openclip_vit_big_g_dynamic |
1390.56 |
517.13 |
img_encoder_proj_openclip_vit_h_dynamic |
591.93 |
147.47 |
CLIP Text Encoder |
||
text_encoder_clip_vit_large |
308.72 |
18.94 |
text_encoder_openclip_vit_h |
510.92 |
48.49 |