ONNX Runtime - Performance Comparison
This page provides a comprehensive performance comparison of validated ONNX models across all MediaTek Genio platforms. Measurements are obtained using ONNX Runtime with hardware acceleration (where available).
Note
G520 / G720: Leverage the Neuron Execution Provider (EP) for high-speed NPU acceleration.
G350 / G510 / G700 / G1200: These platforms currently execute ONNX models via the CPU EP.
All values are represented in milliseconds (ms).
Cells marked as
-indicate data is currently being measured, whileN/Aindicates the model is not supported on that specific hardware configuration.
TAO Related Models
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Object Detection |
PeopleNet (ResNet34) |
Float32 |
114 |
3917 |
|||||||
Object Detection |
PeopleNet (ResNet34) |
Quant8 |
42 |
1092 |
|||||||
Recognition |
Action Recognition Net (ResNet18) |
Float32 |
27 |
388 |
|||||||
Pose Estimation |
BodyPoseNet |
Float32 |
72 |
2000 |
|||||||
Object Detection |
LPDNet (USA Pruned) |
Float32 |
6.5 |
127 |
|||||||
Segmentation |
PeopleSemSegNet_AMR |
Float32 |
932 |
7764 |
|||||||
Segmentation |
PeopleSemSegNet_AMR (Rel) |
Float32 |
54 |
207 |
|||||||
Segmentation |
PeopleSemSegNet (ShuffleSeg) |
Float32 |
56 |
205 |
|||||||
Segmentation |
PeopleSemSegNet (Vanilla Unet) |
Float32 |
1138 |
7680 |
|||||||
Re-Identification |
ReIdentificationNet (ResNet50) |
Float32 |
11 |
315 |
|||||||
Classification |
Retail Object Recognition |
Float32 |
34 |
713 |
|||||||
OCR |
Ocrnet_resnet50 |
Float32 |
39 |
417 |
|||||||
OCR |
Ocrnet_resnet50 (Pruned) |
Float32 |
34 |
263 |
|||||||
OCR |
ocd_resnet50 |
Float32 |
700 |
7161 |
|||||||
OCR |
ocd_resnet50 |
Float32 |
323 |
3274 |
|||||||
OCR |
ocdnet_mixnet |
Float32 |
1116 |
17960 |
|||||||
Classification |
Pose Classification (ST-GCN) |
Float32 |
352 |
968 |
|||||||
Pose Estimation |
Centerpose (Chair DLA34) |
Float32 |
777 |
3427 |
|||||||
Pose Estimation |
Centerpose (Camera FAN) |
Float32 |
5741 |
9137 |
|||||||
Object Detection |
LPDNet (CCPD Pruned) |
Float32 |
12 |
206 |
|||||||
Pose Estimation |
Foundation Pose (Refiner) |
Float32 |
68.7 |
411 |
|||||||
Pose Estimation |
Foundation Pose (Score) |
Float32 |
37.1 |
373 |
|||||||
Pose Estimation |
Multi 3D Centerpose |
Float32 |
442.7 |
1686 |
Legacy Analytical Models
Detection
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Object Detection |
YOLOv5s |
Quant8 |
83.512 |
83.647 |
|||||||
Object Detection |
YOLOv5s |
Float32 |
35.344 |
364.161 |
|||||||
Object Detection |
YOLOv8s |
Quant8 |
120.081 |
120.178 |
|||||||
Object Detection |
YOLO11s |
Quant8 |
110.876 |
110.882 |
Classification
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Classification |
ConvNeXt |
Quant8 |
353.309 |
353.774 |
|||||||
Classification |
ConvNeXt |
Float32 |
591.345 |
687.457 |
|||||||
Classification |
DenseNet |
Quant8 |
54.058 |
54.243 |
|||||||
Classification |
DenseNet |
Float32 |
13.917 |
132.924 |
|||||||
Classification |
EfficientNet |
Quant8 |
13.456 |
13.439 |
|||||||
Classification |
EfficientNet |
Float32 |
2.977 |
48.327 |
|||||||
Classification |
MobileNetV2 |
Quant8 |
5.734 |
5.731 |
|||||||
Classification |
MobileNetV2 |
Float32 |
1.909 |
24.492 |
|||||||
Classification |
MobileNetV3 |
Quant8 |
4.176 |
4.186 |
|||||||
Classification |
MobileNetV3 |
Float32 |
9.393 |
9.191 |
|||||||
Classification |
ResNet |
Quant8 |
17.595 |
17.731 |
|||||||
Classification |
ResNet |
Float32 |
3.659 |
80.046 |
|||||||
Classification |
SqueezeNet |
Quant8 |
17.676 |
17.739 |
|||||||
Classification |
SqueezeNet |
Float32 |
15.313 |
35.054 |
|||||||
Classification |
VGG |
Quant8 |
146.614 |
145.84 |
|||||||
Classification |
VGG |
Float32 |
33.738 |
525.109 |
Recognition
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Recognition |
VGGFace |
Quant8 |
146.057 |
146.218 |
|||||||
Recognition |
VGGFace |
Float32 |
33.762 |
527.26 |
Robotic Models
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Omni6DPose |
scale_policy |
Float32 |
0.029 |
0.034 |
|||||||
Dino |
dino_vitb8 |
Float32 |
59.66 |
494.27 |
|||||||
Diffusion Policy |
model_diffusion_sampling |
Float32 |
15.16 |
15.11 |
|||||||
MobileSam |
mobilesam_decoder |
Float32 |
12.2 |
13.59 |
|||||||
MobileSam |
mobilesam_encoder |
Float32 |
57.81 |
224.04 |
|||||||
RegionNormalizedGrasp |
anchornet |
Float32 |
25.46 |
60.34 |
|||||||
RegionNormalizedGrasp |
localnet |
Float32 |
6.23 |
6.4 |
|||||||
YoloWorld |
yoloworld_xl |
Float32 |
263.01 |
3696.49 |