TFLite(LiteRT) - Generative AI on MT8893

This page lists the generative AI models and representative performance data for MT8893 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.

Model Support and Performance

The following tables summarize the supported generative models and measured performance on MT8893.

Large Language Models (LLMs)

Model

Prompt Mode (tok/s)

Generative Mode (tok/s)

DeepSeek-R1-Distill-Llama-8B

425.791

11.359

DeepSeek-R1-Distill-Qwen-1.5B

1057.25

25.681

DeepSeek-R1-Distill-Qwen-7B

448.167

11.693

gemma2-2b-it

891.004

21.372

internlm2-chat-1_8b

1544.7

42.393

llama3-8b

426.125

11.512

llama3.2-1B-Instruct

2093.61

61.144

llama3.2-3B-Instruct

1022.95

25.048

Qwen2-0.5B-Instruct

3010.84

77.871

Qwen2-1.5B-Instruct

1616.22

38.314

Qwen2-7B-Instruct

474.383

11.642

Qwen1.5-1.8B-Chat

1516.5

31.383

Qwen2.5-1.5B-Instruct

1621.85

38.574

Qwen2.5-3B-Instruct

751.056

20.868

Qwen2.5-7B-Instruct

471.945

11.739

Qwen3 1.7B

1069.16

23.424

Phi-3-mini-4k-instruct

828.868

18.869

MiniCPM-2B-sft-bf16-llama-format

886.721

22.275

medusa_v1_0_vicuna_7b_v1.5

501.053

22.787

vicuna1.5-7b-tree-speculative-decoding-plus

454.583

22.722

llava1.5-7b-speculative-decoding

267.981

6.779

baichuan-7b-int8-cache

561.762

11.37

baichuan-7b

536.642

10.56

Vision-Language Models (VLMs)

Model

ViT Inference Time (s)

Prompt Mode (tok/s)

Generative Mode (tok/s)

Qwen2.5 VL 3B

0.096

339.901

10.1337

InternVL3-1B

0.508

183.641

14.094

Stable Diffusion and Image Generation

Model

Main Time (ms)

Inference Time (ms)

Stable Diffusion v.1.5

7075

6132

Stable Diffusion v.1.5 controlnet

9395

8035

Stable_diffusion_v1_5_controlnet_lora

10268

8472

Stable_diffusion_v1.5_2lora

11487

10130

Stable Diffusion v2.1 base model with controlnet

6969

5451

Stable Diffusion v1.5 LCM Ipadaptor

2254

1077

Stable_diffusion_lcm_multiDiffusion

7438.723

6697.967

CLIP and Embedding Models

Model

Main Time (ms)

Inference Time (ms)

img_encoder_proj_clip_vit_large_dynamic

358.609

51.135

img_encoder_proj_openclip_vit_big_g_dynamic

1390.56

517.126

img_encoder_proj_openclip_vit_h_dynamic

591.931

147.467

text_encoder_clip_vit_large

308.718

18.938

text_encoder_openclip_vit_h

510.919

48.485