TFLite(LiteRT) - Generative AI on Genio 720

This page lists the generative AI models and representative performance data for Genio 720 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.

Model Support and Performance

Note

For VLM, Qwen3-vl is to be released by the end of June,2026.

The following tables summarize the supported generative models and measured performance on Genio 720.

Large Language Models (LLMs)

Model

Prompt Mode (tok/s)

Generative Mode (tok/s)

DeepSeek-R1-Distill-Qwen-1.5B

341.69

11.76

DeepSeek-R1-Distill-Qwen-7B

69.23

4.68

DeepSeek-R1-Distill-Llama-8B

36.65

4.58

Qwen3-1.7B

233.03

10.91

Qwen2.5-1.5B-Instruct

341.42

18.43

Qwen2.5-3B-Instruct

162.48

10.31

Qwen2.5-7B-Instruct

70.55

4.89

gemma3-1B (Text Only)

603.15

26.49

gemma3-4B (Text-Only)

156.89

8.01

gemma2-2b-it

193.39

8.75

llama3.2-1B-Instruct

401.29

24.53

llama3.2-3B-Instruct

154.56

10.58

llama3-8b

56.5

4.7

MiniCPM-2B-sft-bf16-llama-format

194.79

7.69

llava1.5-7b-speculative-decoding

73.1

7.28

medusa_v1_0_vicuna_7b_v1.5

91.82

10.56

vicuna1.5-7b-tree-speculative-decoding-plus

84.9

12.65

baichuan-7b-int8-cache

81.18

4.24

Stable Diffusion and Image Generation

Model

Main Time (ms)

Inference Time (ms)

Stable Diffusion v.1.5 controlnet

33642

32294

Stable Diffusion v2.1 base model with controlnet

31183

29828

CLIP and Embedding Models

Model

Main Time (ms)

Inference Time (ms)

CLIP Image Encoder

img_encoder_proj_clip_vit_large_dynamic

567.61

257.39

img_encoder_proj_openclip_vit_big_g_dynamic

12035.52

3142.96

img_encoder_proj_openclip_vit_h_dynamic

1440.20

881.65

CLIP Text Encoder

text_encoder_clip_vit_large

455.08

38.99

text_encoder_openclip_vit_h

750.70

119.77