TFLite(LiteRT) - Generative AI on Genio 720

This page lists the generative AI models and representative performance data for Genio 720 platforms. For background information about generative workloads and usage notes, refer to TFLite(LiteRT) - Generative AI.

Model Support and Performance

Note

For VLM, Qwen3-vl is to be released by the end of June,2026.

The following tables summarize the supported generative models and measured performance on Genio 720.

Large Language Models (LLMs)
Model	Prompt Mode (tok/s)	Generative Mode (tok/s)
DeepSeek-R1-Distill-Qwen-1.5B	341.69	11.76
DeepSeek-R1-Distill-Qwen-7B	69.23	4.68
DeepSeek-R1-Distill-Llama-8B	36.65	4.58
Qwen3-1.7B	233.03	10.91
Qwen2.5-1.5B-Instruct	341.42	18.43
Qwen2.5-3B-Instruct	162.48	10.31
Qwen2.5-7B-Instruct	70.55	4.89
gemma3-1B (Text Only)	603.15	26.49
gemma3-4B (Text-Only)	156.89	8.01
gemma2-2b-it	193.39	8.75
llama3.2-1B-Instruct	401.29	24.53
llama3.2-3B-Instruct	154.56	10.58
llama3-8b	56.50	4.70
MiniCPM-2B-sft-bf16-llama-format	194.79	7.69
llava1.5-7b-speculative-decoding	73.10	7.28
medusa_v1_0_vicuna_7b_v1.5	91.82	10.56
vicuna1.5-7b-tree-speculative-decoding-plus	84.90	12.65
baichuan-7b-int8-cache	81.18	4.24

Stable Diffusion and Image Generation
Model	Main Time (ms)	Inference Time (ms)
Stable Diffusion v.1.5 controlnet	33642	32294
Stable Diffusion v2.1 base model with controlnet	31183	29828

CLIP and Embedding Models
Model	Main Time (ms)	Inference Time (ms)
CLIP Image Encoder
img_encoder_proj_clip_vit_large_dynamic	567.61	257.39
img_encoder_proj_openclip_vit_big_g_dynamic	12035.52	3142.96
img_encoder_proj_openclip_vit_h_dynamic	1440.20	881.65
CLIP Text Encoder
text_encoder_clip_vit_large	455.08	38.99
text_encoder_openclip_vit_h	750.70	119.77