DETR: Overview and Inference

“Architecture of models like YOLO (You Only Look Once), Faster R-CNN, SSD typically uses multi Convolutional layers, followed by specialized layers for object detection. DETR also uses CNN in backbone for feature extraction, but these features are then passed to Transformer encoder and decoder layers.

Previous models required some sort of hand designed priors, like anchor boxes in YOLO, region proposals in R-CNN. DETR eliminates the need for any such hand designed priors.

The DETR model doesn’t require NMS (Non-maximum suppression) as a post processing technique to remove irrelevant bounding boxes, which was required in CNN based models…”

Request a Quote

Log In

DETR: Overview and Inference

DETR: Overview and Inference