TensorRT FP16 vs PyTorch FP32 on Jetson Orin: 55 models

Verdict

Across 55 models on Jetson Orin, TensorRT FP16 gives a median 3.39x over PyTorch FP32, ranging from 1.68x to 6.21x. The biggest gain is RT-DETRv2-R101 at 6.21x, from 2.8 to 17.3 FPS. Accuracy holds: only DEIMv2-X loses more than half a point, dropping 0.6 (61.3 to 60.7). Every model measured ran faster.

PyTorch FP32 is the reference runtime. TensorRT FP16 is the standard deployment path on Jetson hardware. This compares both on the same 55 models, same COCO protocol, same device: an NVIDIA Jetson Orin Nano Super.

Model	PyTorch FP32 FPS	TensorRT FP16 FPS	Speedup	mAP delta (pts)
RT-DETRv2-R101	2.8	17.3	6.21x	+1.0
RT-DETR-R101	2.8	17.0	6.12x	+2.0
RT-DETRv2-R50m	5.1	29.7	5.85x	+0.0
DEIMv2-Atto	15.4	89.1	5.78x	+1.0
RT-DETR-R50m	5.0	29.1	5.78x	+0.0
RT-DETR-X	3.0	17.0	5.74x	-9.0
RT-DETRv2-R50	4.3	24.2	5.66x	-1.0
RT-DETR-R50	4.3	23.6	5.54x	-17.0
D-FINE-X	2.9	14.2	4.96x	+1.0
RT-DETR-L	4.9	24.1	4.96x	+1.0
DEIM-X	2.9	14.1	4.93x	-2.0
RT-DETRv4-X	2.8	13.8	4.85x	-3.0
DEIMv2-Femto	14.8	65.7	4.43x	+1.0
YOLOX-X	3.5	14.6	4.22x	-3.0
RT-DETRv2-R34	8.1	32.7	4.03x	+3.0

Per-model FPS under PyTorch FP32 and TensorRT FP16 on Jetson Orin, with speedup and mAP delta. Top 15 by speedup.

Speedup varies by family

Conversion gain depends on the model family. RT-DETRv2 gains a median 5.66x, the most of any family. ECDet gains 1.83x, the least. Set your speedup expectation by family, not by a single global number.

Accuracy

mAP is shown in percent form. FP16 conversion holds accuracy on this device. Only one of the 55 models loses more than half a point: DEIMv2-X drops 0.6, from 61.3 to 60.7. Every other model stays within a couple tenths of a point of its PyTorch baseline.

No regressions

No model on this device ran slower under TensorRT FP16. The smallest gain was 1.68x. On Jetson Orin, TensorRT FP16 is a safe default: real speedups with negligible accuracy cost.

Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.

TensorRT FP16 vs PyTorch FP32 on Jetson Orin: 55 models

Speedup varies by family

Accuracy

No regressions

Run any model with one line