Object detection using convolutional neural networks and transformer-based models: a review

Journal of Electrical Systems and Information Technology

Table 2 Comprehensive evaluation metrics and dataset comparative analysis for CV models

Reference	Task	Model Used	Dataset	Results
[1]	Object detection	RCNN	PASCAL VOC	mAP—58.50%
[2]	Object detection	Fast RCNN	PASCAL VOC	mAP—70%
			COCO 2017 test-dev	Box mAP—19.7
				mAP—19.70%
[3]	Object detection	Faster RCNN	PASCAL VOC	mAP—73.20%
			COCO 2017 test-dev	Frame per secs—46.7
				mAP—21.90%
				Average mAP—16.4
[28]	Object detection	DETR	COCO 2017	Average precision (AP)—43.0
				Average mAP—17.7
[29]	Object detection	D-DETR	COCO 2017	Average precision (AP)—46.9
				Average mAP—18.5
				mAP—52.30%
[18]	Object detection	ViT-B/16-FRCNN	COCO 2017	Average precision (AP)—37.8
			OBJECTNET-D	Average precision (AP)—22.9
[6]	Object detection	YOLO	PASCAL VOC	mAP—63.40%
				Frame per secs—46.7
			COCO 2017	Average mAP—32
				mAP—43.50%
[30]	Object detection	YOLOS(VIT-B)	COCO 2017	Average mAP—20
[8]	Object detection	Mask RCNN	COCO 2017	Average mAP—17.6
				Box mAP(Real time)—45.7
[69]	Object detection	ResNet-101	Pascal VOC	Average mAP—63.7
[70]	Object detection	Rank-DETR (ResNet50)	COCO 2017	Average mAP—50.2
[46]	Image classification	ViT-H	ImageNet	Accuracy (Top 1)—88.55%
			CIFAR-10	Percentage correct—99.9%
[46]	Image classification	ViT-B	ImageNet	Accuracy(Top1)—85.2%
[46]	Image classification	ViT-L	ImageNet	Accuracy (Top 1)—87.76%
			CIFAR-10	Percentage correct—99.42%
			PASCAL VOC	mIoU—68
[57]	Semantic segmentation	ViT segmenter	PASCAL VOC	mIoU—59