ONNX FP32 vs PyTorch FP32 on Raspberry Pi 5: 48 models

Verdict

Across 48 models on Raspberry Pi 5, ONNX Runtime FP32 gives a median 1.57x over PyTorch FP32, ranging from 0.92x to 2.49x. The biggest gain is YOLOX-Nano at 2.49x, from 8.6 to 21.3 FPS. Accuracy is not free here: 9 models lose more than half a point of mAP, and one model, PicoDet-L, runs slower on ONNX than on PyTorch.

PyTorch FP32 is the reference runtime. ONNX Runtime FP32 is the common first step off PyTorch on a CPU device. This compares both on the same 48 models, same COCO protocol, same board: a Raspberry Pi 5 running on bare CPU.

Model	PyTorch FP32 FPS	ONNX Runtime FP32 FPS	Speedup	mAP delta (pts)
YOLOX-Nano	8.6	21.3	2.49x	-109.0
DEIMv2-Atto	12.7	30.3	2.39x	+1.0
DEIMv2-Femto	7.4	16.7	2.26x	+1.0
DEIMv2-Pico	2.9	6.2	2.16x	+1.0
RT-DETR-R50m	0.4	0.8	2.08x	+1.0
RT-DETRv2-R50m	0.4	0.8	2.08x	+1.0
YOLOv9-T	2.9	5.9	2.03x	-68.0
DEIM-N	2.3	4.3	1.88x	+0.0
DEIMv2-N	2.4	4.4	1.87x	+0.0
D-FINE-N	2.3	4.3	1.87x	+0.0
RT-DETR-R50	0.3	0.6	1.87x	+0.0
RT-DETRv2-R50	0.3	0.6	1.87x	+1.0
YOLOX-Tiny	5.5	9.4	1.72x	-77.0
YOLOv9-S	1.4	2.4	1.68x	-91.0
D-FINE-L	0.4	0.7	1.65x	+0.0

Per-model FPS under PyTorch FP32 and ONNX Runtime FP32 on Raspberry Pi 5, with speedup and mAP delta. Top 15 by speedup.

Speedup varies by family

Conversion gain depends on the model family. DEIMv2 gains a median 1.87x, the most of any family. ECDet gains 1.07x, the least. On a CPU board the spread is narrow: most families cluster near the 1.57x median.

Accuracy

mAP is shown in percent form. ONNX conversion costs accuracy on this board. 9 of the 48 models lose more than half a point of mAP. YOLOv9-M drops the most at 1.7 points, from 55.3 to 53.6. The losses land on the YOLOX, YOLOv9, and PicoDet families; every other family held its accuracy.

The one regression

One model ran slower on ONNX than on PyTorch: PicoDet-L, at 0.92x. It dropped from 1.9 to 1.7 FPS and also lost 1.5 mAP, from 44.2 to 42.7. It is the only conversion on this board that costs both speed and accuracy. We did not determine the cause; the row is measured, not modeled.

Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.

ONNX FP32 vs PyTorch FP32 on Raspberry Pi 5: 48 models

Speedup varies by family

Accuracy

The one regression

Run any model with one line