Across 55 models on Jetson Orin, TensorRT FP16 gives a median 3.39x over PyTorch FP32, ranging from 1.68x to 6.21x. The biggest gain is RT-DETRv2-R101 at 6.21x, from 2.8 to 17.3 FPS. Accuracy holds: only DEIMv2-X loses more than half a point, dropping 0.6 (61.3 to 60.7). Every model measured ran faster.
PyTorch FP32 is the reference runtime. TensorRT FP16 is the standard deployment path on Jetson hardware. This compares both on the same 55 models, same COCO protocol, same device: an NVIDIA Jetson Orin Nano Super.
| Model | PyTorch FP32 FPS | TensorRT FP16 FPS | Speedup | mAP delta (pts) |
|---|---|---|---|---|
| RT-DETRv2-R101 | 2.8 | 17.3 | 6.21x | +1.0 |
| RT-DETR-R101 | 2.8 | 17.0 | 6.12x | +2.0 |
| RT-DETRv2-R50m | 5.1 | 29.7 | 5.85x | +0.0 |
| DEIMv2-Atto | 15.4 | 89.1 | 5.78x | +1.0 |
| RT-DETR-R50m | 5.0 | 29.1 | 5.78x | +0.0 |
| RT-DETR-X | 3.0 | 17.0 | 5.74x | -9.0 |
| RT-DETRv2-R50 | 4.3 | 24.2 | 5.66x | -1.0 |
| RT-DETR-R50 | 4.3 | 23.6 | 5.54x | -17.0 |
| D-FINE-X | 2.9 | 14.2 | 4.96x | +1.0 |
| RT-DETR-L | 4.9 | 24.1 | 4.96x | +1.0 |
| DEIM-X | 2.9 | 14.1 | 4.93x | -2.0 |
| RT-DETRv4-X | 2.8 | 13.8 | 4.85x | -3.0 |
| DEIMv2-Femto | 14.8 | 65.7 | 4.43x | +1.0 |
| YOLOX-X | 3.5 | 14.6 | 4.22x | -3.0 |
| RT-DETRv2-R34 | 8.1 | 32.7 | 4.03x | +3.0 |
Speedup varies by family
Conversion gain depends on the model family. RT-DETRv2 gains a median 5.66x, the most of any family. ECDet gains 1.83x, the least. Set your speedup expectation by family, not by a single global number.
Accuracy
mAP is shown in percent form. FP16 conversion holds accuracy on this device. Only one of the 55 models loses more than half a point: DEIMv2-X drops 0.6, from 61.3 to 60.7. Every other model stays within a couple tenths of a point of its PyTorch baseline.
No regressions
No model on this device ran slower under TensorRT FP16. The smallest gain was 1.68x. On Jetson Orin, TensorRT FP16 is a safe default: real speedups with negligible accuracy cost.
Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.
