From: Object detection using convolutional neural networks and transformer-based models: a review
Reference | Task | Model Used | Dataset | Results |
---|---|---|---|---|
[1] | Object detection | RCNN | PASCAL VOC | mAP—58.50% |
[2] | Object detection | Fast RCNN | PASCAL VOC | mAP—70% |
COCO 2017 test-dev | Box mAP—19.7 | |||
mAP—19.70% | ||||
[3] | Object detection | Faster RCNN | PASCAL VOC | mAP—73.20% |
COCO 2017 test-dev | Frame per secs—46.7 | |||
mAP—21.90% | ||||
Average mAP—16.4 | ||||
[28] | Object detection | DETR | COCO 2017 | Average precision (AP)—43.0 |
Average mAP—17.7 | ||||
[29] | Object detection | D-DETR | COCO 2017 | Average precision (AP)—46.9 |
Average mAP—18.5 | ||||
mAP—52.30% | ||||
[18] | Object detection | ViT-B/16-FRCNN | COCO 2017 | Average precision (AP)—37.8 |
OBJECTNET-D | Average precision (AP)—22.9 | |||
[6] | Object detection | YOLO | PASCAL VOC | mAP—63.40% |
Frame per secs—46.7 | ||||
COCO 2017 | Average mAP—32 | |||
mAP—43.50% | ||||
[30] | Object detection | YOLOS(VIT-B) | COCO 2017 | Average mAP—20 |
[8] | Object detection | Mask RCNN | COCO 2017 | Average mAP—17.6 |
Box mAP(Real time)—45.7 | ||||
[69] | Object detection | ResNet-101 | Pascal VOC | Average mAP—63.7 |
[70] | Object detection | Rank-DETR (ResNet50) | COCO 2017 | Average mAP—50.2 |
[46] | Image classification | ViT-H | ImageNet | Accuracy (Top 1)—88.55% |
CIFAR-10 | Percentage correct—99.9% | |||
[46] | Image classification | ViT-B | ImageNet | Accuracy(Top1)—85.2% |
[46] | Image classification | ViT-L | ImageNet | Accuracy (Top 1)—87.76% |
CIFAR-10 | Percentage correct—99.42% | |||
PASCAL VOC | mIoU—68 | |||
[57] | Semantic segmentation | ViT segmenter | PASCAL VOC | mIoU—59 |