DETR: Overview and Inference

1

DETR: Overview and Inference

“Architecture of models like YOLO (You Only Look Once), Faster R-CNN, SSD typically uses multi Convolutional layers, followed by specialized layers for object detection. DETR also uses CNN in backbone for feature extraction, but these features are then passed to Transformer encoder and decoder layers.

Previous models required some sort of hand designed priors, like anchor boxes in YOLO, region proposals in R-CNN. DETR eliminates the need for any such hand designed priors.

The DETR model doesn’t require NMS (Non-maximum suppression) as a post processing technique to remove irrelevant bounding boxes, which was required in CNN based models…”

Source: https://learnopencv.com/detr-overview-and-inference/

October 17, 2024
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Subscribe to our Digest