ONNX Runtime - Performance Comparison
This page provides a comprehensive performance comparison of validated ONNX models across all MediaTek Genio platforms. Measurements are obtained using ONNX Runtime with hardware acceleration (where available).
Note
G520 / G720: Leverage the Neuron Execution Provider (EP) for high-speed NPU acceleration.
G350 / G510 / G700 / G1200: These platforms currently execute ONNX models via the CPU EP.
All values are represented in milliseconds (ms).
Cells marked as
-indicate data is currently being measured, whileN/Aindicates the model is not supported on that specific hardware configuration.
TAO Related Models
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Object Detection |
PeopleNet (ResNet34) |
Float32 |
3x544x960 |
83.51 |
3460.66 |
77.63 |
3422.34 |
4423.77 |
4017.74 |
4004.48 |
23720.38 |
Object Detection |
PeopleNet (ResNet34) |
Quant8 |
3x544x960 |
20.49 |
911.40 |
19.44 |
901.13 |
1158.12 |
1049.7 |
1046.67 |
5885.96 |
Recognition |
Action Recognition Net (ResNet18) |
Float32 |
96x224x224 |
16.83 |
353.46 |
14.75 |
350.78 |
440.36 |
402.82 |
400.42 |
2136.85 |
Pose Estimation |
BodyPoseNet |
Float32 |
224x320x3 |
43.99 |
1845.36 |
40.56 |
1825.80 |
2349.49 |
2130.42 |
2133.04 |
12669.66 |
Object Detection |
LPDNet (USA Pruned) |
Float32 |
3x480x640 |
3.68 |
107.06 |
3.42 |
106.47 |
137.91 |
124.04 |
123.63 |
610.1 |
Segmentation |
PeopleSemSegNet_AMR |
Float32 |
3x576x960 |
Not Support |
7685.97 |
Not Support |
7749.61 |
9521.3 |
8841.33 |
8534.15 |
52374.03 |
Segmentation |
PeopleSemSegNet_AMR (Rel) |
Float32 |
3x544x960 |
15.05 |
146.83 |
13.29 |
136.69 |
175.09 |
151.94 |
150.49 |
951.19 |
Segmentation |
PeopleSemSegNet (ShuffleSeg) |
Float32 |
3x544x960 |
15.08 |
140.38 |
13.41 |
136.57 |
174.29 |
151.97 |
152.75 |
954.22 |
Segmentation |
PeopleSemSegNet (Vanilla Unet) |
Float32 |
3x544x960 |
178.51 |
7510.24 |
163.45 |
7346.95 |
9421.51 |
8472.84 |
8456.73 |
49743.08 |
Re-Identification |
ReIdentificationNet (ResNet50) |
Float32 |
3x256x128 |
8.59 |
237.53 |
6.87 |
234.10 |
301.5 |
274.25 |
274.02 |
1642.93 |
OCR |
Ocrnet_resnet50 |
Float32 |
1x32x100 |
20.64 |
300.25 |
18.16 |
296.89 |
384.41 |
349.53 |
349.21 |
2158.96 |
OCR |
Ocrnet_resnet50 (Pruned) |
Float32 |
1x32x100 |
14.93 |
179.51 |
13.94 |
175.70 |
227.94 |
206.21 |
205.27 |
1346.59 |
OCR |
ocd_resnet50 |
Float32 |
3x736x1280 |
169.41 |
6520.78 |
149.85 |
6340.07 |
8154.15 |
7298.74 |
7276.45 |
43412.58 |
OCR |
ocd_resnet50 |
Float32 |
3x640x640 |
76.18 |
2809.26 |
68.10 |
2748.75 |
3519.27 |
3167.32 |
3159.16 |
18802.24 |
OCR |
ocdnet_mixnet |
Float32 |
3x640x640 |
362.87 |
17742.48 |
340.09 |
17436.33 |
22130.68 |
20066.59 |
20011.74 |
124121.02 |
Classification |
Pose Classification (ST-GCN) |
Float32 |
3x300x34x1 |
223.89 |
787.40 |
207.00 |
772.19 |
997.47 |
895.62 |
892.57 |
5119.71 |
Pose Estimation |
Centerpose (Chair DLA34) |
Float32 |
3x512x512 |
Not Support |
3035.51 |
Not Support |
2946.96 |
3765.47 |
3404.64 |
3387.54 |
19636.09 |
Pose Estimation |
Centerpose (Camera FAN) |
Float32 |
3x512x512 |
Not Support |
7689.45 |
Not Support |
7568.34 |
9644.41 |
8784.32 |
8752.96 |
55091.83 |
Object Detection |
LPDNet (CCPD Pruned) |
Float32 |
3x1168x720 |
7.82 |
190.42 |
6.73 |
186.26 |
237.99 |
215.45 |
214.39 |
1030.08 |
Pose Estimation |
Foundation Pose (Refiner) |
Float32 |
6x160x160 |
64.91 |
682.93 |
60.97 |
674.01 |
870.5 |
789.8 |
788.3 |
4656.35 |
Pose Estimation |
Foundation Pose (Score) |
Float32 |
6x160x160 |
37.37 |
622.71 |
34.63 |
615.33 |
796.95 |
722.71 |
722.26 |
4288.7 |
Pose Estimation |
Multi 3D Centerpose |
Float32 |
3x512x512 |
Not Support |
3024.04 |
Not Support |
3005.77 |
3752.66 |
3411.14 |
3391.78 |
20066.16 |
Legacy Analytical Models
Detection
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Object Detection |
YOLOv5s |
Quant8 |
640x640 |
Not Support |
225.74 |
Not Support |
221.29 |
216.01 |
196.04 |
262.93 |
1821.42 |
Object Detection |
YOLOv5s |
Float32 |
640x640 |
36.50 |
607.68 |
32.37 |
586.80 |
756.4 |
683.36 |
681.36 |
3884.26 |
Object Detection |
YOLOv8s |
Quant8 |
640x640 |
90.11 |
353.19 |
80.57 |
346.58 |
325.35 |
295.93 |
415.35 |
3064.62 |
Object Detection |
YOLO11s |
Quant8 |
640x640 |
102.15 |
301.50 |
90.99 |
295.32 |
287.8 |
260.63 |
352.71 |
2428.58 |
Classification
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Classification |
ConvNeXt |
Quant8 |
224x224 |
Not Support |
516.21 |
Not Support |
1115.18 |
657.72 |
595.72 |
599.75 |
4515.67 |
Classification |
ConvNeXt |
Float32 |
224x224 |
Not Support |
1117.20 |
Not Support |
510.37 |
1403.15 |
1285.51 |
1274.93 |
7645.24 |
Classification |
DenseNet |
Quant8 |
224x224 |
Not Support |
104.51 |
Not Support |
103.30 |
105.41 |
95.23 |
118.54 |
819.5 |
Classification |
DenseNet |
Float32 |
224x224 |
8.46 |
205.29 |
7.49 |
200.32 |
254.01 |
231.61 |
227.96 |
1288.14 |
Classification |
EfficientNet |
Quant8 |
224x224 |
33.33 |
24.07 |
30.52 |
23.94 |
27.47 |
25.12 |
27.98 |
156.53 |
Classification |
EfficientNet |
Float32 |
224x224 |
3.15 |
66.64 |
2.81 |
65.57 |
83.76 |
76.32 |
75.52 |
444.61 |
Classification |
MobileNetV2 |
Quant8 |
224x224 |
1.43 |
12.36 |
1.26 |
12.23 |
13.37 |
12.17 |
14.64 |
88.46 |
Classification |
MobileNetV2 |
Float32 |
224x224 |
1.75 |
31.69 |
1.47 |
30.41 |
38.04 |
34.4 |
34.84 |
213.82 |
Classification |
MobileNetV3 |
Quant8 |
224x224 |
Not Support |
6.30 |
Not Support |
6.16 |
7.49 |
6.77 |
7.17 |
42.96 |
Classification |
MobileNetV3 |
Float32 |
224x224 |
13.72 |
10.74 |
12.81 |
10.45 |
13.26 |
12.06 |
11.97 |
79.27 |
Classification |
ResNet |
Quant8 |
224x224 |
2.04 |
45.87 |
1.78 |
45.08 |
42.72 |
39.09 |
52.49 |
408.36 |
Classification |
ResNet |
Float32 |
224x224 |
3.81 |
112.00 |
3.49 |
111.24 |
142.27 |
129.33 |
128.9 |
750.18 |
Classification |
SqueezeNet |
Quant8 |
224x224 |
9.36 |
33.08 |
8.38 |
31.96 |
33.49 |
29.98 |
37.07 |
279.61 |
Classification |
SqueezeNet |
Float32 |
224x224 |
9.86 |
53.15 |
8.81 |
52.00 |
67.04 |
60.54 |
60.5 |
358.19 |
Classification |
VGG |
Quant8 |
224x224 |
13.79 |
366.54 |
11.62 |
366.31 |
347.8 |
322.24 |
423.41 |
3205.02 |
Classification |
VGG |
Float32 |
224x224 |
37.17 |
902.03 |
32.24 |
889.14 |
1151.38 |
1036.48 |
1034.19 |
6363.06 |
Recognition
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Recognition |
VGGFace |
Quant8 |
224x224 |
291.44 |
366.43 |
291.22 |
367.59 |
348.55 |
323.49 |
425.91 |
3198.08 |
Recognition |
VGGFace |
Float32 |
224x224 |
37.98 |
904.24 |
32.84 |
891.02 |
1152.1 |
1037.26 |
1038.44 |
6389.7 |
Robotic Models
Task |
Model Name |
Data Type |
Input Size |
G520 (NPU) |
G520 (CPU) |
G720 (NPU) |
G720 (CPU) |
G510 (CPU) |
G700 (CPU) |
G1200 (CPU) |
G350 (CPU) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Omni6DPose |
scale_policy |
Float32 |
1x3x3 |
Not Support |
0.18 |
Not Support |
0.17 |
0.18 |
0.16 |
0.17 |
1.28 |
Diffusion Policy |
model_diffusion_sampling |
Float32 |
trajectory:1x16x12, global_cond:1x800 |
60.66 |
44.71 |
57.68 |
41.96 |
57.15 |
48.58 |
48.17 |
588.95 |
MobileSam |
mobilesam_encoder |
Float32 |
3x448x448 |
Not Support |
705.14 |
Not Support |
694.74 |
895.22 |
805.51 |
801.28 |
4328.02 |
RegionNormalizedGrasp |
anchornet |
Float32 |
4x640x360 |
13.42 |
187.66 |
12.13 |
183.82 |
229.25 |
206.52 |
207.72 |
1103.65 |
RegionNormalizedGrasp |
localnet |
Float32 |
64x64x6 |
20.02 |
19.65 |
19.78 |
20.06 |
24.85 |
22.57 |
22.62 |
128.98 |
YoloWorld |
yoloworld_xl |
Float32 |
3x640x640 |
465.61 |
11466.42 |
403.15 |
11214.33 |
14432.8 |
13138.01 |
13070.84 |
82227.54 |