TFLite(LiteRT) - Generative AI on Genio 520

This page lists the generative AI models and representative performance data for Genio 520 platforms. For background information about generative workloads and usage notes, refer to TFlite(LiteRT) - Generative AI.

Model Support and Performance

Note

The performance values in this section for Genio 520 are estimated by scaling corresponding Genio 720 measurements using approximate bandwidth-based factors. They are intended for early planning and comparison only. For production workloads, developers must benchmark the target models directly on Genio 520 under the intended system configuration.

The following tables summarize the supported generative models and measured performance on Genio 520.

Large Language Models (LLMs)

Prompt Mode (tok/s)

Generative Mode (tok/s)

DeepSeek-R1-Distill-Llama-8B

29.322

3.662

DeepSeek-R1-Distill-Qwen-1.5B

273.349

9.411

DeepSeek-R1-Distill-Qwen-7B

55.384

3.742

gemma2-2b-it

154.714

7.002

internlm2-chat-1_8b

220.974

14.039

llama3-8b

45.196

3.758

llama3.2-1B-Instruct

321.03

19.626

llama3.2-3B-Instruct

123.646

8.462

Qwen2-0.5B-Instruct

609.964

40.048

Qwen2-1.5B-Instruct

273.594

15.65

Qwen2-7B-Instruct

56.333

3.906

Qwen1.5-1.8B-Chat

248.511

7.916

Qwen2.5-1.5B-Instruct

273.134

14.742

Qwen2.5-3B-Instruct

120

7.84

Qwen2.5-7B-Instruct

56.438

3.914

Qwen3 1.7B

186.426

8.729

Phi-3-mini-4k-instruct

103.68

5.859

MiniCPM-2B-sft-bf16-llama-format

155.834

6.155

medusa_v1_0_vicuna_7b_v1.5

73.457

8.451

vicuna1.5-7b-tree-speculative-decoding-plus

67.916

10.119

llava1.5-7b-speculative-decoding

58.482

5.825

baichuan-7b-int8-cache

64.947

3.391

baichuan-7b

63.796

3.346

Vision-Language Models (VLMs)

ViT Inference Time (s)

Prompt Mode (tok/s)

Generative Mode ( tok/s)

Qwen2.5 VL 3B

0.26

80.052

3.821

InternVL3-1B

2.18

59.798

4.926

Stable Diffusion and Image Generation

Main Time (ms)

Inference Time (ms)

Stable Diffusion v.1.5

32270

31016

Stable Diffusion v.1.5 controlnet

42053

40368

Stable_diffusion_v1_5_controlnet_lora

42685

40568

Stable_diffusion_v1.5_2lora

44973

41494

Stable Diffusion v2.1 base model with controlnet

38979

37285

Stable Diffusion v1.5 LCM Ipadaptor

13306

7326

Stable_diffusion_lcm_multiDiffusion

36379.456

35158.57

CLIP and Embedding Models

Main Time (ms)

Inference Time (ms)

CLIP Image Encoder

img_encoder_proj_clip_vit_large_dynamic

709.513

321.735

img_encoder_proj_openclip_vit_big_g_dynamic

15044.4

3928.699

img_encoder_proj_openclip_vit_h_dynamic

1800.246

1102.059

CLIP Text Encoder

text_encoder_clip_vit_large

568.849

48.741

text_encoder_openclip_vit_h

938.379

149.713