Verdict

Across 55 models on Jetson Orin, TensorRT FP32 gives a median 2.2x over PyTorch FP32, ranging from 1.48x to 5.28x. The biggest gain is DEIMv2-Atto at 5.28x, from 15.4 to 81.4 FPS. Because this stays at full FP32 precision, no model loses more than half a point of mAP, and every model measured ran faster. It is the safe conversion when you want speed without touching accuracy. mAP is shown in percent form.

PyTorch FP32 is the reference runtime. TensorRT FP32 keeps the same numeric precision but compiles the graph for the device. This compares both on the same 55 models, same COCO protocol, same device: an NVIDIA Jetson Orin Nano Super 8GB.

ModelPyTorch FP32 FPSTensorRT FP32 FPSSpeedupmAP delta (pts)
DEIMv2-Atto15.481.45.28x+5.0
DEIMv2-Femto14.860.04.05x-3.0
RT-DETR-R50m5.015.73.12x+4.0
RT-DETRv2-R50m5.115.83.11x+1.0
RT-DETR-R1012.88.43.04x+0.0
RT-DETRv2-R1012.88.53.03x-1.0
RT-DETRv2-R504.312.72.97x-2.0
RT-DETR-R504.312.62.97x-2.0
DEIMv2-N11.133.02.97x-1.0
DEIM-N11.333.42.95x+1.0
RT-DETR-X3.08.62.92x+1.0
D-FINE-N11.733.52.86x+1.0
YOLOv9-T9.927.82.81x+2.0
D-FINE-X2.98.02.80x+2.0
RT-DETR-L4.913.62.80x+2.0
Per-model FPS under PyTorch FP32 and TensorRT FP32 on Jetson Orin, with speedup and mAP delta. Top 15 by speedup.

Speedup varies by family

Conversion gain depends on the model family. RT-DETRv2 gains a median 2.97x, the most of any family here. ECDet gains only 1.8x, the least. The smallest DEIMv2 models pull the top of the range, with DEIMv2-Atto at 5.28x. Set your speedup expectation by family, not by the single global median.

Accuracy holds

This is the reason to pick FP32 over FP16 when accuracy is fixed: no model loses more than half a point of mAP converting from PyTorch. The precision is identical, so the numbers match the baseline within measurement noise. D-FINE-X stays at 61.4 mAP, and DEIMv2-Atto holds 27.5 mAP while running more than five times faster.

No regressions

No model on this device ran slower under TensorRT FP32. The smallest gain was 1.48x. On Jetson Orin, TensorRT FP32 is the conservative default: a real speedup on every model, with accuracy left untouched. Move to FP16 only when you have measured that the extra speed is worth its accuracy cost.

Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.