DEIM-N wins on desktop GPU: 46.8 vs 41.8 mAP@50-95 and 37.0 vs 31.9 FPS against YOLOv9-T on an RTX 5070 Ti in PyTorch. YOLOv9-T is the lighter model: 2.04M parameters and 48 vs 70 MB peak VRAM. The speed order flips in two places, TensorRT FP32 and Raspberry Pi 5, where YOLOv9-T is faster. Pick by deployment target.
DEIM-N (3.78M parameters, Apache-2.0) and YOLOv9-T (2.04M, MIT) are both nano-class detectors evaluated at 640 px input, so the input resolution matches and does not skew the comparison. YOLOv9-T is the smaller model. Both have verified rows on desktop GPU, Jetson Orin, and Raspberry Pi 5, so this comparison covers the deploy targets where the answer changes.
| Metric | DEIM-N | YOLOv9-T |
|---|---|---|
| mAP@50-95 | 4679.0 | 4178.0 |
| mAP@50 | 6385.0 | 5661.0 |
| mAP small | 2822.0 | 2226.0 |
| FPS (mean) | 37.0 | 31.9 |
| Total ms/image | 27.03 | 31.31 |
| Inference ms | 19.96 | 25.15 |
| Peak VRAM (MB) | 70 | 48 |
| Params (M) | 3.8 | 2.0 |
| GFLOPs | 7.0 | 4.0 |
| Input size | 640 | 640 |
| License | Apache-2.0 | MIT |
Accuracy
mAP is shown in percent form. DEIM-N measures 46.8 mAP@50-95 to YOLOv9-T's 41.8, a 5.0 point lead. The gap is wider on small objects: 28.2 vs 22.3 mAP_small, a 26.8% relative difference. DEIM-N leads at every object size.
Speed
On the RTX 5070 Ti, DEIM-N runs 37.0 FPS to YOLOv9-T's 31.9 in PyTorch, a 15.84% edge, and it stretches that on ONNX Runtime: 75.9 vs 52.1 FPS. On Jetson Orin DEIM-N also leads: 11.3 vs 9.9 FPS.
The speed verdict does not hold everywhere. On TensorRT FP32, YOLOv9-T runs 72.0 FPS to DEIM-N's 67.5; on Raspberry Pi 5, 2.9 to 2.3. On those two targets the order reverses.
- DEIM-N license
- Apache-2.0
- YOLOv9-T license
- MIT
- DEIM-N release
- 2024-12-05
- YOLOv9-T release
- 2024-02-21
- Evaluated weights
- LibreYOLO retrained checkpoints
When to pick which
Pick DEIM-N for desktop, ONNX, or Jetson deployment and for small-object accuracy: it leads mAP at every object size and is faster on those runtimes. Pick YOLOv9-T when the model must be small, when memory is tight, or when you deploy on a Raspberry Pi or TensorRT FP32, where it is the faster detector and carries the MIT license. Both are permissive, so licensing does not force the choice.
Try both with LibreYOLO
# pip install libreyolo
from libreyolo import LibreYOLO
deim = LibreYOLO("LibreDEIMn.pt") # 3.78M params, 46.8 mAP@50-95
yolo9 = LibreYOLO("LibreYOLO9t.pt") # 2.04M params, 41.8 mAP@50-95
results = deim("image.jpg")
Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.
