ONNX Runtime - Performance Comparison

This page provides a comprehensive performance comparison of validated ONNX models across all MediaTek Genio platforms. Measurements are obtained using ONNX Runtime with hardware acceleration (where available).

Note

  • G520 / G720: Leverage the Neuron Execution Provider (EP) for high-speed NPU acceleration.

  • G350 / G510 / G700 / G1200: These platforms currently execute ONNX models via the CPU EP.

  • All values are represented in milliseconds (ms).

  • Cells marked as - indicate data is currently being measured, while N/A indicates the model is not supported on that specific hardware configuration.

TAO Related Models
TAO Related Models Performance (Unit: ms)

Task

Model Name

Data Type

Input Size

G520 (NPU)

G520 (CPU)

G720 (NPU)

G720 (CPU)

G510 (CPU)

G700 (CPU)

G1200 (CPU)

G350 (CPU)

Object Detection

PeopleNet (ResNet34)

Float32

3x544x960

83.51

3460.66

77.63

3422.34

4423.77

4017.74

4004.48

23720.38

Object Detection

PeopleNet (ResNet34)

Quant8

3x544x960

20.49

911.40

19.44

901.13

1158.12

1049.7

1046.67

5885.96

Recognition

Action Recognition Net (ResNet18)

Float32

96x224x224

16.83

353.46

14.75

350.78

440.36

402.82

400.42

2136.85

Pose Estimation

BodyPoseNet

Float32

224x320x3

43.99

1845.36

40.56

1825.80

2349.49

2130.42

2133.04

12669.66

Object Detection

LPDNet (USA Pruned)

Float32

3x480x640

3.68

107.06

3.42

106.47

137.91

124.04

123.63

610.1

Segmentation

PeopleSemSegNet_AMR

Float32

3x576x960

Not Support

7685.97

Not Support

7749.61

9521.3

8841.33

8534.15

52374.03

Segmentation

PeopleSemSegNet_AMR (Rel)

Float32

3x544x960

15.05

146.83

13.29

136.69

175.09

151.94

150.49

951.19

Segmentation

PeopleSemSegNet (ShuffleSeg)

Float32

3x544x960

15.08

140.38

13.41

136.57

174.29

151.97

152.75

954.22

Segmentation

PeopleSemSegNet (Vanilla Unet)

Float32

3x544x960

178.51

7510.24

163.45

7346.95

9421.51

8472.84

8456.73

49743.08

Re-Identification

ReIdentificationNet (ResNet50)

Float32

3x256x128

8.59

237.53

6.87

234.10

301.5

274.25

274.02

1642.93

OCR

Ocrnet_resnet50

Float32

1x32x100

20.64

300.25

18.16

296.89

384.41

349.53

349.21

2158.96

OCR

Ocrnet_resnet50 (Pruned)

Float32

1x32x100

14.93

179.51

13.94

175.70

227.94

206.21

205.27

1346.59

OCR

ocd_resnet50

Float32

3x736x1280

169.41

6520.78

149.85

6340.07

8154.15

7298.74

7276.45

43412.58

OCR

ocd_resnet50

Float32

3x640x640

76.18

2809.26

68.10

2748.75

3519.27

3167.32

3159.16

18802.24

OCR

ocdnet_mixnet

Float32

3x640x640

362.87

17742.48

340.09

17436.33

22130.68

20066.59

20011.74

124121.02

Classification

Pose Classification (ST-GCN)

Float32

3x300x34x1

223.89

787.40

207.00

772.19

997.47

895.62

892.57

5119.71

Pose Estimation

Centerpose (Chair DLA34)

Float32

3x512x512

Not Support

3035.51

Not Support

2946.96

3765.47

3404.64

3387.54

19636.09

Pose Estimation

Centerpose (Camera FAN)

Float32

3x512x512

Not Support

7689.45

Not Support

7568.34

9644.41

8784.32

8752.96

55091.83

Object Detection

LPDNet (CCPD Pruned)

Float32

3x1168x720

7.82

190.42

6.73

186.26

237.99

215.45

214.39

1030.08

Pose Estimation

Foundation Pose (Refiner)

Float32

6x160x160

64.91

682.93

60.97

674.01

870.5

789.8

788.3

4656.35

Pose Estimation

Foundation Pose (Score)

Float32

6x160x160

37.37

622.71

34.63

615.33

796.95

722.71

722.26

4288.7

Pose Estimation

Multi 3D Centerpose

Float32

3x512x512

Not Support

3024.04

Not Support

3005.77

3752.66

3411.14

3391.78

20066.16

Legacy Analytical Models

Detection

Detection Models Performance (Unit: ms)

Task

Model Name

Data Type

Input Size

G520 (NPU)

G520 (CPU)

G720 (NPU)

G720 (CPU)

G510 (CPU)

G700 (CPU)

G1200 (CPU)

G350 (CPU)

Object Detection

YOLOv5s

Quant8

640x640

Not Support

225.74

Not Support

221.29

216.01

196.04

262.93

1821.42

Object Detection

YOLOv5s

Float32

640x640

36.50

607.68

32.37

586.80

756.4

683.36

681.36

3884.26

Object Detection

YOLOv8s

Quant8

640x640

90.11

353.19

80.57

346.58

325.35

295.93

415.35

3064.62

Object Detection

YOLO11s

Quant8

640x640

102.15

301.50

90.99

295.32

287.8

260.63

352.71

2428.58

Classification

Classification Models Performance (Unit: ms)

Task

Model Name

Data Type

Input Size

G520 (NPU)

G520 (CPU)

G720 (NPU)

G720 (CPU)

G510 (CPU)

G700 (CPU)

G1200 (CPU)

G350 (CPU)

Classification

ConvNeXt

Quant8

224x224

Not Support

516.21

Not Support

1115.18

657.72

595.72

599.75

4515.67

Classification

ConvNeXt

Float32

224x224

Not Support

1117.20

Not Support

510.37

1403.15

1285.51

1274.93

7645.24

Classification

DenseNet

Quant8

224x224

Not Support

104.51

Not Support

103.30

105.41

95.23

118.54

819.5

Classification

DenseNet

Float32

224x224

8.46

205.29

7.49

200.32

254.01

231.61

227.96

1288.14

Classification

EfficientNet

Quant8

224x224

33.33

24.07

30.52

23.94

27.47

25.12

27.98

156.53

Classification

EfficientNet

Float32

224x224

3.15

66.64

2.81

65.57

83.76

76.32

75.52

444.61

Classification

MobileNetV2

Quant8

224x224

1.43

12.36

1.26

12.23

13.37

12.17

14.64

88.46

Classification

MobileNetV2

Float32

224x224

1.75

31.69

1.47

30.41

38.04

34.4

34.84

213.82

Classification

MobileNetV3

Quant8

224x224

Not Support

6.30

Not Support

6.16

7.49

6.77

7.17

42.96

Classification

MobileNetV3

Float32

224x224

13.72

10.74

12.81

10.45

13.26

12.06

11.97

79.27

Classification

ResNet

Quant8

224x224

2.04

45.87

1.78

45.08

42.72

39.09

52.49

408.36

Classification

ResNet

Float32

224x224

3.81

112.00

3.49

111.24

142.27

129.33

128.9

750.18

Classification

SqueezeNet

Quant8

224x224

9.36

33.08

8.38

31.96

33.49

29.98

37.07

279.61

Classification

SqueezeNet

Float32

224x224

9.86

53.15

8.81

52.00

67.04

60.54

60.5

358.19

Classification

VGG

Quant8

224x224

13.79

366.54

11.62

366.31

347.8

322.24

423.41

3205.02

Classification

VGG

Float32

224x224

37.17

902.03

32.24

889.14

1151.38

1036.48

1034.19

6363.06

Recognition

Recognition Models Performance (Unit: ms)

Task

Model Name

Data Type

Input Size

G520 (NPU)

G520 (CPU)

G720 (NPU)

G720 (CPU)

G510 (CPU)

G700 (CPU)

G1200 (CPU)

G350 (CPU)

Recognition

VGGFace

Quant8

224x224

291.44

366.43

291.22

367.59

348.55

323.49

425.91

3198.08

Recognition

VGGFace

Float32

224x224

37.98

904.24

32.84

891.02

1152.1

1037.26

1038.44

6389.7

Robotic Models
Robotic Models Performance (Unit: ms)

Task

Model Name

Data Type

Input Size

G520 (NPU)

G520 (CPU)

G720 (NPU)

G720 (CPU)

G510 (CPU)

G700 (CPU)

G1200 (CPU)

G350 (CPU)

Omni6DPose

scale_policy

Float32

1x3x3

Not Support

0.18

Not Support

0.17

0.18

0.16

0.17

1.28

Diffusion Policy

model_diffusion_sampling

Float32

trajectory:1x16x12, global_cond:1x800

60.66

44.71

57.68

41.96

57.15

48.58

48.17

588.95

MobileSam

mobilesam_encoder

Float32

3x448x448

Not Support

705.14

Not Support

694.74

895.22

805.51

801.28

4328.02

RegionNormalizedGrasp

anchornet

Float32

4x640x360

13.42

187.66

12.13

183.82

229.25

206.52

207.72

1103.65

RegionNormalizedGrasp

localnet

Float32

64x64x6

20.02

19.65

19.78

20.06

24.85

22.57

22.62

128.98

YoloWorld

yoloworld_xl

Float32

3x640x640

465.61

11466.42

403.15

11214.33

14432.8

13138.01

13070.84

82227.54