ncnn FP32 vs ONNX FP32 on Raspberry Pi 5: 13 models

Verdict

On Raspberry Pi 5 CPU, ncnn FP32 runs a median 2.09x faster than ONNX Runtime FP32 across 13 models, ranging from 1.25x to 2.87x. The biggest gain is YOLOX-X at 2.87x. The catch is accuracy: five of the 13 models, all YOLOX, lose a point of mAP or more in the ncnn conversion. Take the speedup, but re-check mAP if you run a YOLOX model. mAP is shown in percent form.

The Raspberry Pi 5 has no GPU for these models, so both runtimes are CPU inference. ONNX Runtime is the common baseline; ncnn is the CPU-tuned alternative. This compares both on the same 13 models, same COCO protocol, same device: a Raspberry Pi 5.

Model	ONNX Runtime FP32 FPS	ncnn FP32 FPS	Speedup	mAP delta (pts)
YOLOX-X	0.3	0.9	2.87x	-94.0
YOLOX-L	0.5	1.5	2.85x	-141.0
YOLOX-M	1.1	2.9	2.72x	-106.0
YOLOv9-C	0.7	1.9	2.61x	+18.0
YOLOv9-M	1.0	2.3	2.44x	-2.0
YOLOX-S	2.5	5.7	2.26x	-125.0
YOLOv9-S	2.4	4.9	2.09x	-12.0
YOLOX-Tiny	9.4	18.4	1.97x	-28.0
YOLOv9-T	5.9	11.0	1.87x	-16.0
YOLOX-Nano	21.3	30.7	1.44x	-91.0
PicoDet-M	6.2	8.5	1.36x	-7.0
PicoDet-L	1.7	2.3	1.33x	+10.0
PicoDet-S	15.0	18.8	1.25x	-47.0

Per-model FPS under ONNX Runtime FP32 and ncnn FP32 on Raspberry Pi 5, with speedup and mAP delta. Sorted by speedup.

Speedup varies by family

Conversion gain depends on the model family. YOLOX gains a median 2.72x, the most of the three families here. YOLOv9 gains 2.44x. PicoDet gains only 1.33x. On a bare Pi CPU that difference is large: the fastest-gaining models more than double their frame rate, while PicoDet barely moves.

The YOLOX accuracy cost

The speedup is not free for YOLOX. All five YOLOX models lose a point of mAP or more in the ncnn conversion, and they are the only models that lose accuracy at all. YOLOX-L falls from 54.3 to 52.9, the largest drop. YOLOX-S falls from 43.1 to 41.9, YOLOX-M from 50.8 to 49.7, YOLOX-X from 54.9 to 53.9, and YOLOX-Nano from 27.7 to 26.8. The YOLOv9 and PicoDet models hold their accuracy through the same conversion.

No regressions

No model ran slower under ncnn. The smallest gain was 1.25x, on PicoDet-S. On Raspberry Pi 5, ncnn is the faster CPU runtime across the board: the only question is whether your model is a YOLOX, in which case you trade a point of mAP for the speed.

Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.

ncnn FP32 vs ONNX FP32 on Raspberry Pi 5: 13 models

Speedup varies by family

The YOLOX accuracy cost

No regressions

Run any model with one line