On Jetson Orin, moving from PyTorch FP32 to ONNX Runtime FP32 buys almost nothing: the median speedup is 1.02x across 55 models, and the range runs from 0.39x to 2.85x. Many models end up slower, not faster. Only the smallest DEIMv2 models see a real gain. If you want speed on this device, convert to TensorRT instead. mAP is shown in percent form.
PyTorch FP32 is the reference runtime. ONNX Runtime FP32 keeps the same precision and runs the exported graph. This compares both on the same 55 models, same COCO protocol, same device: an NVIDIA Jetson Orin Nano Super 8GB.
| Model | PyTorch FP32 FPS | ONNX Runtime FP32 FPS | Speedup | mAP delta (pts) |
|---|---|---|---|---|
| DEIMv2-Atto | 15.4 | 43.9 | 2.85x | +1.0 |
| DEIMv2-Femto | 14.8 | 37.1 | 2.51x | +1.0 |
| YOLOv9-T | 9.9 | 18.6 | 1.88x | +1.0 |
| YOLOX-Nano | 18.4 | 34.4 | 1.87x | +0.0 |
| DEIMv2-N | 11.1 | 17.6 | 1.58x | +0.0 |
| DEIM-N | 11.3 | 17.5 | 1.55x | +0.0 |
| YOLOX-Tiny | 20.3 | 31.4 | 1.55x | +0.0 |
| DEIMv2-Pico | 14.1 | 21.3 | 1.51x | +1.0 |
| D-FINE-N | 11.7 | 17.5 | 1.50x | +0.0 |
| YOLOv9-S | 9.9 | 12.5 | 1.27x | -1.0 |
| PicoDet-S | 13.8 | 16.4 | 1.18x | -74.0 |
| RT-DETR-R50m | 5.0 | 5.4 | 1.08x | +0.0 |
| RT-DETRv2-R50m | 5.1 | 5.4 | 1.07x | -1.0 |
| DEIM-X | 2.9 | 3.0 | 1.06x | +0.0 |
| D-FINE-X | 2.9 | 3.0 | 1.06x | +0.0 |
The gain is concentrated in small models
Only a few families gain anything. DEIMv2 leads at a median 1.51x, and its smallest member, DEIMv2-Atto, is the single biggest gainer at 2.85x, from 15.4 to 43.9 FPS. ECDet is the worst at a median 0.83x, meaning every ECDet model ran slower after conversion. Set your expectation by family, and expect no gain at all for most of the field.
Many models run slower
This is the reason to skip ONNX FP32 on Jetson. A large share of the 55 models ran slower than their PyTorch baseline, which is why the median lands at just 1.02x. The worst case is PicoDet-L, which collapses to 0.39x, dropping from 11.1 to 4.3 FPS. The large transformer detectors mostly land near parity, gaining nothing worth the export step.
Accuracy cost
Three of the 55 models also lose half a point of mAP or more through the ONNX export. PicoDet-S drops from 30.4 to 29.6, PicoDet-M from 37.9 to 37.3, and PicoDet-L from 44.1 to 43.6. So the one family that loses accuracy is also the one that loses the most speed.
ONNX Runtime FP32 is not a speed win on Jetson Orin. Use it only as a conversion step toward TensorRT, which delivers real gains on this device.
Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.
