TFLite(LiteRT) - Generative AI on Genio 520

This page lists the generative AI models and representative performance data for Genio 520 platforms. For background information about generative workloads and usage notes, refer to TFlite(LiteRT) - Generative AI.

Model Support and Performance

Note

For VLM, Qwen3-vl is to be released by the end of June,2026.

The following tables summarize the supported generative models and measured performance on Genio 520.

Large Language Models (LLMs)

Model

Prompt Mode (tok/s)

Generative Mode (tok/s)

DeepSeek-R1-Distill-Qwen-1.5B

276.82

9.18

DeepSeek-R1-Distill-Qwen-7B

Not Support

Not Support

DeepSeek-R1-Distill-Llama-8B

Not Support

Not Support

Qwen3-1.7B

229.63

10.53

Qwen2.5-1.5B-Instruct

269.87

15.11

Qwen2.5-3B-Instruct

Not Support

Not Support

Qwen2.5-7B-Instruct

Not Support

Not Support

gemma3-1B (Text Only)

491.53

22.13

gemma2-2b-it

Not Support

Not Support

llama3.2-1B-Instruct

335.92

20.59

llama3.2-3B-Instruct

Not Support

Not Support

llama3-8b

Not Support

Not Support

MiniCPM-2B-sft-bf16-llama-format

153.14

6.48

llava1.5-7b-speculative-decoding

Not Support

Not Support

medusa_v1_0_vicuna_7b_v1.5

Not Support

Not Support

vicuna1.5-7b-tree-speculative-decoding-plus

Not Support

Not Support

baichuan-7b-int8-cache

Not Support

Not Support

Stable Diffusion and Image Generation

Model

Main Time (ms)

Inference Time (ms)

Stable Diffusion v.1.5 controlnet

47923

37581

Stable Diffusion v2.1 base model with controlnet

Not Support

Not Support

CLIP and Embedding Models

Model

Main Time (ms)

Inference Time (ms)

CLIP Image Encoder

img_encoder_proj_clip_vit_large_dynamic

662.22

296.84

img_encoder_proj_openclip_vit_big_g_dynamic

Not Support

Not Support

img_encoder_proj_openclip_vit_h_dynamic

Not Support

Not Support

CLIP Text Encoder

text_encoder_clip_vit_large

748.45

45.56

text_encoder_openclip_vit_h

Not Support

Not Support