ONNX FP32 vs PyTorch FP32 on Jetson Orin: 55 models

Verdict

On Jetson Orin, moving from PyTorch FP32 to ONNX Runtime FP32 buys almost nothing: the median speedup is 1.02x across 55 models, and the range runs from 0.39x to 2.85x. Many models end up slower, not faster. Only the smallest DEIMv2 models see a real gain. If you want speed on this device, convert to TensorRT instead. mAP is shown in percent form.

PyTorch FP32 is the reference runtime. ONNX Runtime FP32 keeps the same precision and runs the exported graph. This compares both on the same 55 models, same COCO protocol, same device: an NVIDIA Jetson Orin Nano Super 8GB.

Model	PyTorch FP32 FPS	ONNX Runtime FP32 FPS	Speedup	mAP delta (pts)
DEIMv2-Atto	15.4	43.9	2.85x	+1.0
DEIMv2-Femto	14.8	37.1	2.51x	+1.0
YOLOv9-T	9.9	18.6	1.88x	+1.0
YOLOX-Nano	18.4	34.4	1.87x	+0.0
DEIMv2-N	11.1	17.6	1.58x	+0.0
DEIM-N	11.3	17.5	1.55x	+0.0
YOLOX-Tiny	20.3	31.4	1.55x	+0.0
DEIMv2-Pico	14.1	21.3	1.51x	+1.0
D-FINE-N	11.7	17.5	1.50x	+0.0
YOLOv9-S	9.9	12.5	1.27x	-1.0
PicoDet-S	13.8	16.4	1.18x	-74.0
RT-DETR-R50m	5.0	5.4	1.08x	+0.0
RT-DETRv2-R50m	5.1	5.4	1.07x	-1.0
DEIM-X	2.9	3.0	1.06x	+0.0
D-FINE-X	2.9	3.0	1.06x	+0.0

Per-model FPS under PyTorch FP32 and ONNX Runtime FP32 on Jetson Orin, with speedup and mAP delta. Top 15 by speedup.

The gain is concentrated in small models

Only a few families gain anything. DEIMv2 leads at a median 1.51x, and its smallest member, DEIMv2-Atto, is the single biggest gainer at 2.85x, from 15.4 to 43.9 FPS. ECDet is the worst at a median 0.83x, meaning every ECDet model ran slower after conversion. Set your expectation by family, and expect no gain at all for most of the field.

Many models run slower

This is the reason to skip ONNX FP32 on Jetson. A large share of the 55 models ran slower than their PyTorch baseline, which is why the median lands at just 1.02x. The worst case is PicoDet-L, which collapses to 0.39x, dropping from 11.1 to 4.3 FPS. The large transformer detectors mostly land near parity, gaining nothing worth the export step.

Accuracy cost

Three of the 55 models also lose half a point of mAP or more through the ONNX export. PicoDet-S drops from 30.4 to 29.6, PicoDet-M from 37.9 to 37.3, and PicoDet-L from 44.1 to 43.6. So the one family that loses accuracy is also the one that loses the most speed.

ONNX Runtime FP32 is not a speed win on Jetson Orin. Use it only as a conversion step toward TensorRT, which delivers real gains on this device.

Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.

ONNX FP32 vs PyTorch FP32 on Jetson Orin: 55 models

The gain is concentrated in small models

Many models run slower

Accuracy cost

Run any model with one line