TFLite(LiteRT) - Generative AI on MT8893

This page lists the generative AI models and representative performance data for MT8893 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.

Model Support and Performance

Note

For VLM, Qwen3-vl is to be released by the end of June,2026.

The following tables summarize the supported generative models and measured performance on MT8893.

Large Language Models (LLMs)

Model

Prompt Mode (tok/s)

Generative Mode (tok/s)

DeepSeek-R1-Distill-Qwen-1.5B

1057.25

25.68

DeepSeek-R1-Distill-Qwen-7B

448.17

11.69

DeepSeek-R1-Distill-Llama-8B

425.79

11.36

Qwen3-1.7B

1069.16

23.42

Qwen2.5-1.5B-Instruct

1621.85

38.57

Qwen2.5-3B-Instruct

751.06

20.87

Qwen2.5-7B-Instruct

471.95

11.74

gemma2-2b-it

891.00

21.37

llama3.2-1B-Instruct

2093.61

61.14

llama3.2-3B-Instruct

1022.95

25.05

llama3-8b

426.13

11.51

MiniCPM-2B-sft-bf16-llama-format

886.72

22.28

llava1.5-7b-speculative-decoding

267.98

6.78

medusa_v1_0_vicuna_7b_v1.5

501.05

22.79

vicuna1.5-7b-tree-speculative-decoding-plus

454.58

22.72

baichuan-7b-int8-cache

561.76

11.37

Stable Diffusion and Image Generation

Model

Main Time (ms)

Inference Time (ms)

Stable Diffusion v.1.5 controlnet

9395

8035

Stable Diffusion v2.1 base model with controlnet

6969

5451

CLIP and Embedding Models

Model

Main Time (ms)

Inference Time (ms)

CLIP Image Encoder

img_encoder_proj_clip_vit_large_dynamic

358.61

51.14

img_encoder_proj_openclip_vit_big_g_dynamic

1390.56

517.13

img_encoder_proj_openclip_vit_h_dynamic

591.93

147.47

CLIP Text Encoder

text_encoder_clip_vit_large

308.72

18.94

text_encoder_openclip_vit_h

510.92

48.49