Verdict

DEIM-S wins on desktop GPU: 52.1 vs 50.4 mAP@50-95 and 33.4 vs 30.8 FPS against YOLOv9-S on an RTX 5070 Ti in PyTorch. YOLOv9-S is the lighter model: 7.23M parameters and 92 vs 142 MB peak VRAM. The speed order flips on Raspberry Pi 5, where YOLOv9-S runs 1.4 FPS to DEIM-S's 1.1. Pick by deployment target.

DEIM-S (10.32M parameters, Apache-2.0) and YOLOv9-S (7.23M, MIT) sit in the same small-detector class, both evaluated at 640 px input, so the input resolution matches and does not skew the comparison. YOLOv9-S is the smaller model. Both have verified rows on desktop GPU, Jetson Orin, and Raspberry Pi 5, so this comparison covers the deploy targets where the answer changes.

MetricDEIM-SYOLOv9-S
mAP@50-955210.05045.0
mAP@506833.06720.0
mAP small3602.03074.0
FPS (mean)33.430.8
Total ms/image29.9832.51
Inference ms22.9126.45
Peak VRAM (MB)14292
Params (M)10.37.2
GFLOPs25.013.5
Input size640640
LicenseApache-2.0MIT
DEIM-S vs YOLOv9-S on NVIDIA RTX 5070 Ti, PyTorch FP32, batch 1. mAP shown in percent form.
Live chartverified data
Accuracy vs parameters on COCO val2017. DEIM-S and YOLOv9-S highlighted against the full set.

Accuracy

mAP is shown in percent form. DEIM-S measures 52.1 mAP@50-95 to YOLOv9-S's 50.4, a 1.7 point lead. The gap is wider on small objects: 36.0 vs 30.7 mAP_small, a 17.17% relative difference. If your scenes are dominated by small instances, that row decides the comparison.

Speed

On the RTX 5070 Ti, DEIM-S runs 33.4 FPS to YOLOv9-S's 30.8 in PyTorch, an 8.42% edge, and the gap widens under conversion: 67.5 vs 54.5 FPS on ONNX Runtime, 97.6 vs 75.1 FPS on TensorRT FP32. On Jetson Orin the two are close: 10.2 vs 9.9 FPS.

The speed verdict does not travel to CPU. On Raspberry Pi 5, YOLOv9-S runs 1.4 FPS to DEIM-S's 1.1. If your target is a bare Pi, the desktop numbers point the wrong way.

License and provenance
DEIM-S license
Apache-2.0
YOLOv9-S license
MIT
DEIM-S release
2024-12-05
YOLOv9-S release
2024-02-21
Evaluated weights
LibreYOLO retrained checkpoints

When to pick which

Pick DEIM-S for desktop, server, or Jetson deployment and for small-object accuracy: it is more accurate at every object size and faster on every GPU runtime measured, at the cost of 142 vs 92 MB peak VRAM. Pick YOLOv9-S for a Raspberry Pi, where it is the faster detector, or when memory is the constraint. Both ship under permissive licenses, so licensing does not force the choice.

Try both with LibreYOLO

# pip install libreyolo
from libreyolo import LibreYOLO

deim = LibreYOLO("LibreDEIMs.pt")    # 10.32M params, 52.1 mAP@50-95
yolo9 = LibreYOLO("LibreYOLO9s.pt")  # 7.23M params, 50.4 mAP@50-95

results = deim("image.jpg")

Every number on this page comes from the verified dataset: same 500-image COCO val2017 slice, conf 0.001, IoU 0.6, max 300 detections, pycocotools mAP, identical protocol across all hardware and runtimes. The full protocol is on the methodology page. To rerun this comparison with your own filters, open compare. Accuracy is measured on LibreYOLO retrained checkpoints; other weight sources can yield different values.