Skip to main content

Table 2 Comprehensive evaluation metrics and dataset comparative analysis for CV models

From: Object detection using convolutional neural networks and transformer-based models: a review

Reference

Task

Model Used

Dataset

Results

[1]

Object detection

RCNN

PASCAL VOC

mAP—58.50%

[2]

Object detection

Fast RCNN

PASCAL VOC

mAP—70%

   

COCO 2017 test-dev

Box mAP—19.7

    

mAP—19.70%

[3]

Object detection

Faster RCNN

PASCAL VOC

mAP—73.20%

   

COCO 2017 test-dev

Frame per secs—46.7

    

mAP—21.90%

    

Average mAP—16.4

[28]

Object detection

DETR

COCO 2017

Average precision (AP)—43.0

    

Average mAP—17.7

[29]

Object detection

D-DETR

COCO 2017

Average precision (AP)—46.9

    

Average mAP—18.5

    

mAP—52.30%

[18]

Object detection

ViT-B/16-FRCNN

COCO 2017

Average precision (AP)—37.8

   

OBJECTNET-D

Average precision (AP)—22.9

[6]

Object detection

YOLO

PASCAL VOC

mAP—63.40%

    

Frame per secs—46.7

   

COCO 2017

Average mAP—32

    

mAP—43.50%

[30]

Object detection

YOLOS(VIT-B)

COCO 2017

Average mAP—20

[8]

Object detection

Mask RCNN

COCO 2017

Average mAP—17.6

    

Box mAP(Real time)—45.7

[69]

Object detection

ResNet-101

Pascal VOC

Average mAP—63.7

[70]

Object detection

Rank-DETR (ResNet50)

COCO 2017

Average mAP—50.2

[46]

Image classification

ViT-H

ImageNet

Accuracy (Top 1)—88.55%

   

CIFAR-10

Percentage correct—99.9%

[46]

Image classification

ViT-B

ImageNet

Accuracy(Top1)—85.2%

[46]

Image classification

ViT-L

ImageNet

Accuracy (Top 1)—87.76%

   

CIFAR-10

Percentage correct—99.42%

   

PASCAL VOC

mIoU—68

[57]

Semantic segmentation

ViT segmenter

PASCAL VOC

mIoU—59