A Comprehensive Machine Learning based Pipeline for an Accurate Early Prediction of Sepsis in ICU

Research output: Contribution to journalArticlepeer-review


Sepsis is a lethal infection-related illness that has an extremely high fatality rate, especially among intensive care unit patients. Early and precise recognition of sepsis is critical as delayed treatment increases the mortality rate dramatically. System inflammatory response syndrome, quick sequential organ failure assessment, and modified early warning score are the traditional clinical score systems in practice to detect sepsis. But the scoring systems fail in the early prediction of sepsis, a stage in which if a patient is treated immediately, the mortality rate will reduce significantly. The proposed classifier can accurately predict sepsis up to six hours before the disease is clinically diagnosed. The patient's electronic medical records, demographics, and vital signs are used to achieve this. The study uses data set adaptive data preprocessing strategies. The proposed method adds value to existing literature by introducing a novel outlier-based mean-median data imputation technique that enhances the prediction's overall accuracy. The primary factors that influence the classifier's predictions have been outlined, making the model easier to understand for medical professionals. For the classification of patients as sepsis positive or negative, four algorithms were investigated: Random Forest, Logistic Regression, Gradient Boosting, and Decision Tree. Of all the prediction algorithms, Random Forest gives the best results with an accuracy of 99.01%, F1-score of 99%, and an area under the receiver operator characteristic curve of 99.99%. Even for a 24-hour early prediction of sepsis, the random forest method is proven to provide greater prediction accuracy while logistic regression provides the least prediction accuracy. We attribute this to the fact that, unlike regression models, random forests do not require that the model have a linear relationship between the dependent and independent variables. The evaluation measures produced are useful and can be tremendously valuable in predicting sepsis in a timely and accurate manner.

Original languageEnglish
Pages (from-to)105120-105132
Number of pages13
JournalIEEE Access
Publication statusPublished - 2022

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)
  • Electrical and Electronic Engineering


Dive into the research topics of 'A Comprehensive Machine Learning based Pipeline for an Accurate Early Prediction of Sepsis in ICU'. Together they form a unique fingerprint.

Cite this