Skip to main navigation Skip to search Skip to main content

Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms

  • G. R. Ashisha
  • , X. Anitha Mary*
  • , E. Grace Mary Kanaga
  • , J. Andrew*
  • , R. Jennifer Eunice
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic.

Original languageEnglish
Article number270
JournalInternational Journal of Computational Intelligence Systems
Volume17
Issue number1
DOIs
Publication statusPublished - 12-2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms'. Together they form a unique fingerprint.

Cite this