From: Healthcare predictive analytics using machine learning and deep learning techniques: a survey
Disease | study | Methodology | Dataset | Approaches | Findings |
---|---|---|---|---|---|
Diabetes | 97 | A framework to develop and evaluate ML classification models for diabetes patient prediction | PIDD | Logistic regression, KNN, SVM, and RF | Logistic regression achieved the highest accuracy with 83% Other factors should be considered for diabetes prediction, like family history of diabetes, smoking habits, and physical inactivity |
98 | A created diagnosis system to predict diabetes | Frankfurt Hospital in Germany and PIDD provided by the UCI ML repository | RF, SVM, NB, and DT | SVM achieved the highest accuracy with 83.1% Using a DL approach to predict diabetes may lead to achieving better results | |
100 | A new model to predict type 2 diabetes | The Australian CBHS health funds dataset. 18,700,000 hospital admission records from 1995 to 2018 for 124,000 de-identified patients | Logistic regression, SVM, NB, KNN, DT, RF, and ANN | RF achieved the highest accuracy of 84.95% among other models This study relies only on the dataset providing hospital admission and discharge summaries from one insurance company | |
102 | An ML model to predict type 2 diabetes (T2D) occurrence in the following year (Y + 1) using variables in the current year (Y) | Dataset was collected at a private medical institute as electronic health records from 2013 to 2018 | Logistic regression–RF–SVM–ensemble machine learning | RF achieved the highest accuracy with 73% Additional data sources should be applied to verify the models developed in this study Should be taken into consideration other additional tests to fasting plasma glucose (FPG) | |
103 | ML algorithms to predict diabetes | PIDD is taken from the UCI Repository | SVM and NB algorithms | NB achieved the highest accuracy with 91% - The authors acknowledge that they need to extend to the latest dataset that will contain additional attributes and rows | |
105 | A predictive model for the classification of diabetes | Participants' demographic information, medical history, and contact information are collected via a questionnaire-style data collection form | Logistic regression | - Logistic regression achieved Accuracy with 92% - The authors have not compared the model with other diabetes prediction algorithms | |
117 | A chronic disease risk prediction framework to predict type 2 diabetes risk | Private healthcare funds based in Australia (It contains 749,000 patients) Cover a span of 6Â years between September 2009 and March 2015 | Regression, Parameter optimization, and tree classification | -The binary tree classification has achieved the highest accuracy at 86.22% -The source of the dataset is the hospital admission and discharge summary. Therefore, it does not contain general physician (GP) visit information and subsequent diagnoses | |
120 | A cuckoo search-based deep LSTM classifier for diabetes prediction | PIMA dataset | Deep convLSTM | - The model performs maximal accuracy with 97.59% - The authors noticed more datasets are needed, as well as new approaches to improve the classifier's effectiveness | |
127 | A method developed based on the RNN algorithm for predicting blood glucose levels for diabetics during a period of one hour | Ohio T1DM dataset for blood glucose level prediction | RNN | The authors point out that they can only evaluate prediction goals with enough glucose level history; thus, they cannot anticipate the beginning levels after a gap, which does not improve the prediction's quality | |
130 | The authors designed a DL approach for delivering 30-min predictions about future glucose levels | Electronic health records datasets: OhioT1DM from clinical trials and the in silicon dataset from the UVA-Padova simulator | NNs, SVR, and ARX | DRNN model gets the highest performance with the smallest RMSE, MARD and time lag The number of clinical datasets is limited and, however, often restricted. Because certain data fields are manually entered, they are occasionally incorrect | |
132 | The authors proposed (GluNet), an approach to glucose forecasting | OhioT1DM datasets | CNN | The authors point out that the model does not consider physiological knowledge, and that they need to test GluNet with larger prediction horizons and use it to predict overnight hypoglycemia | |
133 | A short-term blood glucose prediction model (VMD-IPSO-LSTM) | The data of 56 participants were chosen as experimental data among 451 diabetic Mellitus patients | LSTM | The experiments revealed that it improved prediction accuracy at "30Â min, 45Â min, and 60Â min" The time it takes to estimate glucose levels in the short term will be reduced | |
136 | A new DL method to increase the reliability and precision of type 1 diabetes predictions | Dataset from 759 people with type 1 diabetes who visited Sheffield Teaching Hospitals between 2013 and 2015 | CNNs | The authors point out that in the presence of insufficient data and certain physiological specificities, prediction accuracy deteriorates | |
137 | The authors constructed a framework for predicting and diagnosing the diabetic | PIMA Dataset | ANN, NB, DT, and DL | DL is regarded as the most effective method for analyzing diabetes, with a 98.07% accuracy rate The technique uses a variety of classifiers to accurately predict the disease, but it failed to diagnose it at an early stage | |
99 | An ML method for predicting COVID-19 | OpenData Resources from Mexico and Brazil | Logistic regression–DT–boosted RF | - The model for Mexico has achieved 93% accuracy, F1 score is 79%, and the Brazil model has a 69% accuracy, F1 score is 75% The authors should be concerned about the usage of authentication and privacy management of the created data | |
119 | A DL approach that uses chest radiography images to differentiate between patients with mild, pneumonia, and COVID-19 infections | COV-PEN dataset | DNNs (ResNet-50) | - The authors emphasized that tests using a vast and hard dataset encompassing several COVID-19 cases are necessary to establish the efficacy of the suggested system | |
COVID-19 | 121 | A wavelet-based CNN to handle data limitations in time of COVID-19 fast emergence | Two open-source datasets from the National Institute of Health, North America) | CNN | The authors acknowledge they hope to investigate the effects of other wavelet functions besides the Haar wavelet |
122 | A CNN framework for COVID-19 identification | Public CT dataset of 2482 CT images from patients of both classifications | CNN | CNN achieved accuracy with 96.16% and recall of 95.41% The authors stated that the use of the framework should be extended to multimodal medical pictures in the future | |
126 | Detecting diseases in people whose X-ray had been selected as potential COVID-19 candidates | 657 chest X-ray images for the diagnosis of COVID-19 | CNN and RNN | The VGG19 model is the most successful one and it has an accuracy rate of 95% The success percentage can be improved, according to the authors, by improving data collection. In addition to chest radiography, lung tomography can be used. The success ratio and performance can be enhanced by creating numerous DL models | |
128 | A new deep anomaly detection model for fast, reliable screening of COVID-19 | X-ray dataset, which contains 100 images from 70 COVID-19 persons and 1431 images from 1008 non-COVID-19 pneumonia subjects | CNN | - Sensitivity of 90.00% specificity of 87.84% or sensitivity of 96.00% with a specificity of 70.65% The authors noted that the model still has certain flaws, such as missing 4% of COVID-19 cases and having a 30% false positive rate. In addition, more clinical data are required to confirm and improve the model's usefulness | |
129 | COVIDX-Net framework to diagnose COVID-19 in X-ray images automatically | Small dataset of 50 photographs | MobileNetV2, ResNetV2, VGG19, DenseNet201, InceptionV3, Inception, and Xception | - The f1-scores for the VGG19 and (DenseNet) models were 89% and 91%, respectively. With f1-scores of 67%, The InceptionV3 model has the weakest classification performance | |
134 | A new paradigm for primary COVID-19 detection based on a radiology review of chest radiography or chest X-ray | X-rays from verified COVID-19 patients (408 photographs), confirmed pneumonia patients (4273 images), and healthy people (1590 images) to perform a three-class image classification (1590 images). There are 6271 people in total in the dataset | CNN | - Accuracy ranged from 93.90% to 93.90% The authors will face a restriction, particularly when it comes to adopting such a model on a large scale for practical usage | |
135 | DL models for predicting the number of COVID-19-positive cases in Indian states | The Ministry of Health and Family Welfare dataset contains time series data for 32 individual confirmed COVID-19 cases in each of the states (28) and union territories (4) since March 14, 2020 | RNN-based LSTMs | Bidirectional LSTM produced the best performance in terms of prediction errors, while convolutional LSTM produced the worst performance Daily and weekly forecasts were calculated, and bi-LSTM produced accurate results (error less than 3%) for short-term prediction (1–3 days) | |
104 | A framework for detecting heart disease in its earliest stages | UCI heart disease dataset | K-means clustering | K-means clustering achieved the accuracy of 94.06% - The authors should apply the proposed technique using more than one algorithm and using more than one dataset | |
Heart disease | 107 | A decision-making system that assists with automated predictions about the condition of the patient’s heart | Cleveland Heart Disease dataset | KNN, RF, DT, and NB | - KNN achieved the highest accuracy with 94% The authors should extend the presented technique to leverage more than one dataset and forecast different diseases |
108 | A model for predicting heart disease in the earliest stage | Cleveland dataset | NB, SVM, KNN, and DT | - KNN achieved the highest accuracy with 90.79% | |
109 | An ML model to predict heart disease | Cardiovascular dataset | DT, NB, logistic regression, RF, SVM, and KNN | DT achieved the highest accuracy with 73% The authors highlighted that the ensemble ML techniques employing the CVD dataset can generate a better illness prediction model | |
110 | A framework to improve prediction accuracy for heart disease | Cardiovascular study on residents of the town of Framingham, Massachusetts. Contains different variables like age, gender, sex, chest pain, slope, and target | Logistic regression algorithm, Scikit-Learn in ML | Accuracy 87% Needs to optimize time complexity for the used models | |
112 | Predicting the risk factors that cause heart disease | Cleveland heart disease | K-means clustering algorithm | Age, maximum heart rate, and the chest pain type play a vital role in predicting heart disease The dataset is too small | |
113 | A prediction model for heart disease survivability using various ML techniques | Cleveland heart disease dataset | DT, RF, logistic regression, SVM, and NB | RF achieved the highest accuracy with 87% RF Forest gives better accuracy on low-dimensional datasets The model could be extended on a distributed environment such as Map–Reduce, Apache Mahout, and HBase | |
114 | A single model named hybridization combines several algorithms to predict the heart disease | Cleveland heart disease dataset | NB, SVM, KNN, NN, J4.8, RF, and GA. NB and SVM | The proposed model achieved an accuracy of 89.2% The authors noted that the dataset is little; hence, the system was not able to train adequately, so the accuracy of the method was bad | |
116 | Predicting coronary heart disease prediction based on ML techniques | Sample of males in a heart disease high-risk region of the Western Cape in South Africa (462 instances) | NB, SVM, and DT | - SVM and DT J48 outperformed NB with a Specificity rate of 82% - SVM and DT J48 outperformed NB with a Specificity rate of 82% but proved to have an unacceptable Sensitivity rate of less than 50% | |
118 | A system for predicting patients with the more common inveterate diseases | Indian chronic kidney disease dataset | CNN, KNN, NB, DT, and logistic regression | CNN and KNN achieved the highest accuracy with 95% The proposed technique should be applied using more than one dataset | |
Liver disease | 115 | A framework concentrated on the utilization of clinical data for liver disease prediction | Northeast of Andhra Pradesh, India | Logistic regression, KNN, DT, SVM, NB, and RF | Dependent on F1 measure: Logistic regression: 75% Naive Bayes: 53% Need to adopt other models to give higher accuracy |
Multiple Disease Detection | 101 | A healthcare management system used by patients to schedule appointments with doctors and verify prescriptions | Datasets of diabetes, heart disease, chronic kidney disease, and liver | DT, RF, logistic regression, and NB | Logistic regression the highest accuracy with 98.5 % in the heart dataset Image datasets should be included to allow image processing of reports and the deployment of DL to detect diseases |
106 | A prediction model that analyzes the user's symptoms and predicts the disease at an early stage | A total of 41 disorders were included as a dependent variable | DT, NB | All algorithms achieved the same accuracy score of 95.12% The authors noticed that overfitting occurred when all 132 symptoms from the original dataset were assessed instead of 95 symptoms. That is, the tree appears to remember the dataset provided and thus fails to classify new data. As a result, just 95 symptoms were assessed during the data-cleansing process, with the best ones being chosen | |
111 | A reliable prediction model for predicting lung cancer | Heart disease dataset and lung cancer dataset | SVM, genetic algorithms | Lung cancer Accuracy 81.8182% Using primitive tools The size, type, and source of data used is not mentioned | |
123 | LSTM approach to Performed multi-disease prediction for intelligent clinical decision support to predicting future disease diagnoses | A large clinical record dataset (over 5 million records) collected from a hospital in Southeast China | LSTM | The F1 score rises from 78.9 to 86.4%, respectively, with the state-of-the-art conventional and DL models, to 88.0 percent with the LSTM approach The authors stated that the model prediction performance may be enhanced further by including new input variables and that to reduce computational complexity, the method only uses one data source | |
124 | An approach introduced to predict the diabetes by creating a supervised ANN structure based on the subnets instead of layers | Iris and diabetes dataset | Multilayer perceptrons (MLPs) as well as LSTM | Proposed deep learning model achieved 97% accuracy This model is useless because not implement our model on large textual and image datasets | |
125 | A novel AI and Internet of Things (IoT) convergence-based disease detection model for a smart healthcare system | Diabetes and heart disease | CSO-LSTM | CSO-LSTM achieved an accuracy of 96.16% This method offered a greater prediction accuracy for heart disease and diabetes diagnosis, but there was no feature selection mechanism; hence, it requires extensive computational | |
 | 134 | A DNN model to predict stroke death based on medical history and human behaviors utilizing large-scale electronic health information | Korean National Hospital Discharge In-depth Injury Survey (KNHDS) data from 2013 to 2016 | DNN | The sensitivity, specificity, and AUC values were 64.32%, 85.56%, and 83.48%, respectively |