TFLite(LiteRT) - Generative AI

The Generative Model section provides performance and capability data for large language models (LLMs), vision-language models (VLMs) and image generation models on MediaTek Genio platforms.

This section is intended as a reference for benchmarking and platform capability validation, not as a distribution channel for full training or deployment assets.

Analytical AI inference paths: ONNX Runtime

Note

For Generative AI workloads, this section provides performance data and capability information only.

Access to the full Generative AI deployment toolkit (GAI toolkit) requires a non-disclosure agreement (NDA) with MediaTek. After signing an NDA, the toolkit can be downloaded from NeuroPilot Document.

Model Categories

The generative models in this section are grouped into the following categories:

  • Large Language Models (LLMs) – Text-only models for tasks such as dialogue, summarization, and code generation.

  • Vision-Language Models (VLMs) – Multimodal models that process both images and text (for example, image captioning or visual question answering).

  • Image Generation and Enhancement – Models such as Stable Diffusion and other diffusion or transformer-based pipelines used for image synthesis, editing, or super-resolution.

  • Embedding and Encoder Models – Models like CLIP encoders for computing joint image-text embeddings for retrieval or ranking tasks.

Supported Models on Genio Products

Platform-specific model lists and performance data are provided in the following pages:

Performance Notes and Limitations

For Generative AI workloads, measured performance on Genio 520 may be slightly lower than on Genio 720.

This gap is primarily due to DRAM bandwidth differences between the two platforms and might affect:

  • Token generation speed for LLMs.

  • End-to-end latency for diffusion-based image generation.

  • Multimodal pipelines that exchange large intermediate tensors between subsystems.

The following comparative data is provided for reference.

Important

The tables in this section provide representative numbers only. To obtain the most accurate performance for a specific use case, developers must deploy and run the workload directly on the target platform under the intended system configuration.

LLM Performance Comparison
Prompt Mode Comparison (Unit: tok/s)

Model

Genio 720

Genio 520

MT8893

DeepSeek-R1-Distill-Llama-8B

36.653

29.322

425.791

DeepSeek-R1-Distill-Qwen-1.5B

341.686

273.349

1057.25

DeepSeek-R1-Distill-Qwen-7B

69.23

55.384

448.167

gemma2-2b-it

193.392

154.714

891.004

internlm2-chat-1_8b

276.218

220.974

1544.7

llama3-8b

56.495

45.196

426.125

llama3.2-1B-Instruct

401.288

321.03

2093.61

llama3.2-3B-Instruct

154.557

123.646

1022.95

Qwen2-0.5B-Instruct

762.455

609.964

3010.84

Qwen2-1.5B-Instruct

341.993

273.594

1616.22

Qwen2-7B-Instruct

70.416

56.333

474.383

Qwen1.5-1.8B-Chat

310.639

248.511

1516.5

Qwen2.5-1.5B-Instruct

341.418

273.134

1621.85

Qwen2.5-3B-Instruct

162.481

120

751.056

Qwen2.5-7B-Instruct

70.548

56.438

471.945

Qwen3 1.7B

233.032

186.426

1069.16

Phi-3-mini-4k-instruct

129.6

103.68

828.868

MiniCPM-2B-sft-bf16-llama-format

194.793

155.834

886.721

medusa_v1_0_vicuna_7b_v1.5

91.821

73.457

501.053

vicuna1.5-7b-tree-speculative-decoding-plus

84.895

67.916

454.583

llava1.5-7b-speculative-decoding

73.103

58.482

267.981

baichuan-7b-int8-cache

81.184

64.947

561.762

baichuan-7b

79.745

63.796

536.642

Generative Mode Comparison (Unit: tok/s)

Model

Genio 720

Genio 520

MT8893

DeepSeek-R1-Distill-Llama-8B

4.578

3.662

11.359

DeepSeek-R1-Distill-Qwen-1.5B

11.764

9.411

25.681

DeepSeek-R1-Distill-Qwen-7B

4.677

3.742

11.693

gemma2-2b-it

8.752

7.002

21.372

internlm2-chat-1_8b

17.549

14.039

42.393

llama3-8b

4.698

3.758

11.512

llama3.2-1B-Instruct

24.533

19.626

61.144

llama3.2-3B-Instruct

10.577

8.462

25.048

Qwen2-0.5B-Instruct

50.06

40.048

77.871

Qwen2-1.5B-Instruct

19.563

15.65

38.314

Qwen2-7B-Instruct

4.883

3.906

11.642

Qwen1.5-1.8B-Chat

9.895

7.916

31.383

Qwen2.5-1.5B-Instruct

18.427

14.742

38.574

Qwen2.5-3B-Instruct

10.31

7.84

20.868

Qwen2.5-7B-Instruct

4.892

3.914

11.739

Qwen3 1.7B

10.911

8.729

23.424

Phi-3-mini-4k-instruct

7.324

5.859

18.869

MiniCPM-2B-sft-bf16-llama-format

7.694

6.155

22.275

medusa_v1_0_vicuna_7b_v1.5

10.564

8.451

22.787

vicuna1.5-7b-tree-speculative-decoding-plus

12.6489

10.119

22.722

llava1.5-7b-speculative-decoding

7.281

5.825

6.779

baichuan-7b-int8-cache

4.239

3.391

11.37

baichuan-7b

4.182

3.346

10.56

VLM Performance Comparison
ViT Inference Time (Unit: s)

Model

Genio 720

Genio 520

MT8893

Qwen2.5 VL 3B

0.208

0.26

0.096

InternVL3-1B

1.744

2.18

0.508

Prompt Mode (Unit: tok/s)

Model

Genio 720

Genio 520

MT8893

Qwen2.5 VL 3B

100.065

80.052

339.901

InternVL3-1B

74.748

59.798

183.641

Generative Mode (Unit: tok/s)

Model

Genio 720

Genio 520

MT8893

Qwen2.5 VL 3B

4.776

3.821

10.1337

InternVL3-1B

6.157

4.926

14.094

Stable Diffusion Performance Comparison
Main Time Comparison (Unit: ms)

Model

Genio 720

Genio 520

MT8893

Stable Diffusion v.1.5

25816

32270

7075

Stable Diffusion v.1.5 controlnet

33642

42053

9395

Stable_diffusion_v1_5_controlnet_lora

34148

42685

10268

Stable_diffusion_v1.5_2lora

35978

44973

11487

Stable Diffusion v2.1 base model with controlnet

31183

38979

6969

Stable Diffusion v1.5 LCM Ipadaptor

10645

13306

2254

Stable_diffusion_lcm_multiDiffusion

29103.565

36379.456

7438.723

Inference Time Comparison (Unit: ms)

Model

Genio 720

Genio 520

MT8893

Stable Diffusion v.1.5

24813

31016

6132

Stable Diffusion v.1.5 controlnet

32294

40368

8035

Stable_diffusion_v1_5_controlnet_lora

32454

40568

8472

Stable_diffusion_v1.5_2lora

33195

41494

10130

Stable Diffusion v2.1 base model with controlnet

29828

37285

5451

Stable Diffusion v1.5 LCM Ipadaptor

5861

7326

1077

Stable_diffusion_lcm_multiDiffusion

28126.856

35158.57

6697.967

CLIP Performance Comparison
Main Time Comparison (Unit: ms)

Model

Genio 720

Genio 520

MT8893

img_encoder_proj_clip_vit_large_dynamic

567.61

709.513

358.609

img_encoder_proj_openclip_vit_big_g_dynamic

12035.52

15044.4

1390.56

img_encoder_proj_openclip_vit_h_dynamic

1440.197

1800.246

591.931

text_encoder_clip_vit_large

455.079

568.849

308.718

text_encoder_openclip_vit_h

750.703

938.379

510.919

Inference Time Comparison (Unit: ms)

Model

Genio 720

Genio 520

MT8893

img_encoder_proj_clip_vit_large_dynamic

257.388

321.735

51.135

img_encoder_proj_openclip_vit_big_g_dynamic

3142.959

3928.699

517.126

img_encoder_proj_openclip_vit_h_dynamic

881.647

1102.059

147.467

text_encoder_clip_vit_large

38.993

48.741

18.938

text_encoder_openclip_vit_h

119.77

149.713

48.485

Deployment and Source Models

The generative models referenced in this section are primarily intended for benchmarking and capability validation.

  • Model accuracy and qualitative output quality are not addressed.

  • MediaTek does not redistribute original training datasets or checkpoint files for third-party or open-source models.

  • For production deployment, developers must:

    • Obtain the original models from their official sources,

    • Follow the applicable licenses and usage terms, and

    • Perform any required fine-tuning or post-training optimization for your application.