Skip to main content

Table 5 A comprehensive comparative study of the previous works

From: Healthcare predictive analytics using machine learning and deep learning techniques: a survey

Disease

study

Methodology

Dataset

Approaches

Findings

Diabetes

97

A framework to develop and evaluate ML classification models for diabetes patient prediction

PIDD

Logistic regression, KNN, SVM, and RF

Logistic regression achieved the highest

accuracy with 83%

Other factors should be considered for diabetes prediction, like family history of diabetes, smoking habits, and physical inactivity

98

A created diagnosis system to predict diabetes

Frankfurt Hospital in Germany and PIDD provided by the UCI ML repository

RF, SVM, NB, and DT

SVM achieved the highest accuracy with 83.1%

Using a DL approach to predict diabetes may lead to achieving better results

100

A new model to predict type 2 diabetes

The Australian CBHS health funds dataset. 18,700,000 hospital admission records from 1995 to 2018 for 124,000 de-identified patients

Logistic regression, SVM, NB, KNN, DT, RF, and ANN

RF achieved the highest accuracy of 84.95%

among other models

This study relies only on the dataset providing hospital admission and discharge summaries from one insurance company

102

An ML model to predict type 2 diabetes (T2D) occurrence in the following year (Y + 1) using variables in the current year (Y)

Dataset was collected at a private medical institute as electronic health records from 2013 to 2018

Logistic regression–RF–SVM–ensemble machine learning

RF achieved the highest accuracy with 73%

Additional data sources should be applied to verify the models developed in this study

Should be taken into consideration other additional tests to fasting plasma glucose (FPG)

103

ML algorithms to predict diabetes

PIDD is taken from the UCI Repository

SVM and NB algorithms

NB achieved the highest accuracy with 91%

- The authors acknowledge that they need to extend to the latest dataset that will contain additional attributes and rows

105

A predictive model for the classification of diabetes

Participants' demographic information, medical history, and contact information are collected via a questionnaire-style data collection form

Logistic regression

- Logistic regression achieved Accuracy with 92%

- The authors have not compared the model with other diabetes prediction algorithms

117

A chronic disease risk prediction framework to predict type 2 diabetes risk

Private healthcare funds based in Australia (It contains 749,000 patients) Cover a span of 6 years between September 2009 and March 2015

Regression, Parameter optimization, and tree classification

-The binary tree classification has achieved the

highest accuracy at 86.22%

-The source of the dataset is the hospital admission and discharge summary. Therefore, it does not contain general physician (GP) visit information and subsequent diagnoses

120

A cuckoo search-based deep LSTM classifier for diabetes prediction

PIMA dataset

Deep convLSTM

- The model performs maximal accuracy with

97.59%

- The authors noticed more datasets are needed, as well as new approaches to improve the classifier's effectiveness

127

A method developed based on the RNN algorithm for predicting blood glucose levels for diabetics during a period of one hour

Ohio T1DM dataset for blood glucose level prediction

RNN

The authors point out that they can only evaluate prediction goals with enough glucose level history; thus, they cannot anticipate the beginning levels after a gap, which does not improve the prediction's quality

130

The authors designed a DL approach for delivering 30-min predictions about future glucose levels

Electronic health records datasets: OhioT1DM from clinical trials and the in silicon dataset from the UVA-Padova simulator

NNs, SVR, and ARX

DRNN model gets the highest performance with

the smallest RMSE, MARD and time lag

The number of clinical datasets is limited and, however, often restricted. Because certain data fields are manually entered, they are occasionally incorrect

132

The authors proposed (GluNet), an approach to glucose forecasting

OhioT1DM datasets

CNN

The authors point out that the model does not consider physiological knowledge, and that they need to test GluNet with larger prediction horizons and use it to predict overnight hypoglycemia

133

A short-term blood glucose prediction model (VMD-IPSO-LSTM)

The data of 56 participants were chosen as experimental data among 451 diabetic Mellitus patients

LSTM

The experiments revealed that it improved prediction accuracy at "30 min, 45 min, and 60 min"

The time it takes to estimate glucose levels in the short term will be reduced

136

A new DL method to increase the reliability and precision of type 1 diabetes predictions

Dataset from 759 people with type 1 diabetes who visited Sheffield Teaching Hospitals between 2013 and 2015

CNNs

The authors point out that in the presence of insufficient data and certain physiological specificities, prediction accuracy deteriorates

137

The authors constructed a framework for predicting and diagnosing the diabetic

PIMA Dataset

ANN, NB, DT, and DL

DL is regarded as the most effective method for analyzing diabetes, with a 98.07% accuracy rate

The technique uses a variety of classifiers to accurately predict the disease, but it failed to diagnose it at an early stage

99

An ML method for predicting COVID-19

OpenData Resources from Mexico and Brazil

Logistic regression–DT–boosted RF

- The model for Mexico has achieved 93% accuracy, F1 score is 79%, and the Brazil model has a 69% accuracy, F1 score is 75%

The authors should be concerned about the usage of authentication and privacy management of the created data

119

A DL approach that uses chest radiography images to differentiate between patients with mild, pneumonia, and COVID-19 infections

COV-PEN dataset

DNNs (ResNet-50)

- The authors emphasized that tests using a vast and hard dataset encompassing several COVID-19 cases are necessary to establish the efficacy of the suggested system

COVID-19

121

A wavelet-based CNN to handle data limitations in time of COVID-19 fast emergence

Two open-source datasets from the National Institute of Health, North America)

CNN

The authors acknowledge they hope to investigate the effects of other wavelet functions besides the Haar wavelet

122

A CNN framework for COVID-19 identification

Public CT dataset of 2482 CT images from patients of both classifications

CNN

CNN achieved accuracy with 96.16% and recall

of 95.41%

The authors stated that the use of the framework should be extended to multimodal medical pictures in the future

126

Detecting diseases in people whose X-ray had been selected as potential COVID-19 candidates

657 chest X-ray images for the diagnosis of COVID-19

CNN and RNN

The VGG19 model is the most successful one

and it has an accuracy rate of 95%

The success percentage can be improved, according to the authors, by improving data collection. In addition to chest radiography, lung tomography can be used. The success ratio and performance can be enhanced by creating numerous DL models

128

A new deep anomaly detection model for fast, reliable screening of COVID-19

X-ray dataset, which contains 100 images from 70 COVID-19 persons and 1431 images from 1008 non-COVID-19 pneumonia subjects

CNN

- Sensitivity of 90.00% specificity of 87.84% or sensitivity of 96.00% with a specificity of 70.65%

The authors noted that the model still has certain flaws, such as missing 4% of COVID-19 cases and having a 30% false positive rate. In addition, more clinical data are required to confirm and improve the model's usefulness

129

COVIDX-Net framework to diagnose COVID-19 in X-ray images automatically

Small dataset of 50 photographs

MobileNetV2, ResNetV2, VGG19, DenseNet201, InceptionV3, Inception, and Xception

- The f1-scores for the VGG19 and (DenseNet) models were 89% and 91%, respectively. With f1-scores of 67%,

The InceptionV3 model has the weakest classification performance

134

A new paradigm for primary COVID-19 detection based on a radiology review of chest radiography or chest X-ray

X-rays from verified COVID-19 patients (408 photographs), confirmed pneumonia patients (4273 images), and healthy people (1590 images) to perform a three-class image classification (1590 images). There are 6271 people in total in the dataset

CNN

- Accuracy ranged from 93.90% to 93.90%

The authors will face a restriction, particularly when it comes to adopting such a model on a large scale for practical usage

135

DL models for predicting the number of COVID-19-positive cases in Indian states

The Ministry of Health and Family Welfare dataset contains time series data for 32 individual confirmed COVID-19 cases in each of the states (28) and union territories (4) since March 14, 2020

RNN-based LSTMs

Bidirectional LSTM produced the best performance in terms of prediction errors, while convolutional LSTM produced the worst performance

Daily and weekly forecasts were calculated, and bi-LSTM produced accurate results (error less than 3%) for short-term prediction (1–3 days)

104

A framework for detecting heart disease in its earliest stages

UCI heart disease dataset

K-means clustering

K-means clustering achieved the accuracy of

94.06%

- The authors should apply the proposed technique

using more than one algorithm and using more than

one dataset

Heart disease

107

A decision-making system that assists with automated predictions about the condition of the patient’s heart

Cleveland Heart Disease dataset

KNN, RF, DT, and NB

- KNN achieved the highest accuracy with 94%

The authors should extend the presented technique to leverage more than one dataset and forecast different diseases

108

A model for predicting heart disease in the earliest stage

Cleveland dataset

NB, SVM, KNN, and DT

- KNN achieved the highest accuracy with 90.79%

109

An ML model to predict heart disease

Cardiovascular dataset

DT, NB, logistic regression, RF, SVM, and KNN

DT achieved the highest accuracy with 73%

The authors highlighted that the ensemble ML

techniques employing the CVD dataset can generate

a better illness prediction model

110

A framework to improve prediction accuracy for heart disease

Cardiovascular study on residents of the town of Framingham, Massachusetts. Contains different variables like age, gender, sex, chest pain, slope, and target

Logistic regression algorithm, Scikit-Learn in ML

Accuracy 87%

Needs to optimize time complexity for the used

models

112

Predicting the risk factors that cause heart disease

Cleveland heart disease

K-means clustering algorithm

Age, maximum heart rate, and the chest pain type play a vital role

in predicting heart disease

The dataset is too small

113

A prediction model for heart disease survivability using various ML techniques

Cleveland heart disease

dataset

DT, RF, logistic regression, SVM, and NB

RF achieved the highest accuracy with 87%

RF Forest gives better accuracy on low-dimensional datasets

The model could be extended on a distributed environment such as Map–Reduce, Apache Mahout, and HBase

114

A single model named hybridization combines several algorithms to predict the heart disease

Cleveland heart disease

dataset

NB, SVM, KNN, NN, J4.8, RF, and GA. NB and SVM

The proposed model achieved an accuracy of 89.2%

The authors noted that the dataset is little; hence, the system was not able to train adequately, so the accuracy of the method was bad

116

Predicting coronary heart disease prediction based on ML techniques

Sample of males in a heart disease high-risk region of the Western Cape in South Africa (462 instances)

NB, SVM, and DT

- SVM and DT J48 outperformed NB with a Specificity rate of 82%

- SVM and DT J48 outperformed NB with a Specificity rate of 82% but proved to have an unacceptable Sensitivity rate of less than 50%

118

A system for predicting patients with the more common inveterate diseases

Indian chronic kidney disease dataset

CNN, KNN, NB, DT, and logistic regression

CNN and KNN achieved the highest accuracy with 95%

The proposed technique should be applied using more than one dataset

Liver disease

115

A framework concentrated on the utilization of clinical data for liver disease prediction

Northeast of Andhra Pradesh, India

Logistic regression, KNN, DT, SVM, NB, and RF

Dependent on F1 measure:

Logistic regression: 75%

Naive Bayes: 53%

Need to adopt other models to give higher accuracy

Multiple Disease Detection

101

A healthcare management system used by patients to schedule appointments with doctors and verify prescriptions

Datasets of diabetes, heart disease, chronic kidney disease, and liver

DT, RF, logistic regression, and NB

Logistic regression the highest accuracy with 98.5

% in the heart dataset

Image datasets should be included to allow image

processing of reports and the deployment of DL

to detect diseases

106

A prediction model that analyzes the user's symptoms and predicts the disease at an early stage

A total of 41 disorders were included as a dependent variable

DT, NB

All algorithms achieved the same accuracy score

of 95.12%

The authors noticed that overfitting occurred when all 132 symptoms from the original dataset were assessed instead of 95 symptoms. That is, the tree appears to remember the dataset provided and thus fails to classify new data. As a result, just 95 symptoms were assessed during the data-cleansing process, with the best ones being chosen

111

A reliable prediction model for predicting lung cancer

Heart disease dataset and lung cancer dataset

SVM, genetic algorithms

Lung cancer Accuracy 81.8182%

Using primitive tools

The size, type, and source of data used is not mentioned

123

LSTM approach to Performed multi-disease prediction for intelligent clinical decision support to predicting future disease diagnoses

A large clinical record dataset (over 5 million records) collected from a hospital in Southeast China

LSTM

The F1 score rises from 78.9 to 86.4%, respectively, with the state-of-the-art conventional and DL models, to 88.0 percent with the LSTM approach

The authors stated that the model prediction performance may be enhanced further by including new input variables and that to reduce computational complexity, the method only uses one data source

124

An approach introduced to predict the diabetes by creating a supervised ANN structure based on the subnets instead of layers

Iris and diabetes dataset

Multilayer perceptrons (MLPs) as well as LSTM

Proposed deep learning model achieved 97%

accuracy

This model is useless because not implement our model on large textual and image datasets

125

A novel AI and Internet of Things (IoT) convergence-based disease detection model for a smart healthcare system

Diabetes and heart disease

CSO-LSTM

CSO-LSTM achieved an accuracy of 96.16%

This method offered a greater prediction accuracy for heart disease and diabetes diagnosis, but there was no feature selection mechanism; hence, it requires extensive computational

 

134

A DNN model to predict stroke death based on medical history and human behaviors utilizing large-scale electronic health information

Korean National Hospital Discharge In-depth Injury Survey (KNHDS) data from 2013 to 2016

DNN

The sensitivity, specificity, and AUC values were 64.32%, 85.56%, and 83.48%, respectively