TY - JOUR
T1 - Predicting clinical outcomes of radiotherapy for head and neck squamous cell carcinoma patients using machine learning algorithms
AU - Gangil, Tarun
AU - Shahabuddin, Amina Beevi
AU - Dinesh Rao, B.
AU - Palanisamy, Krishnamoorthy
AU - Chakrabarti, Biswaroop
AU - Sharan, Krishna
N1 - Funding Information:
The authors would like to acknowledge Mr. Manjunatha Maiya, Project Manager, Philips Research India, Bangalore, Karnataka, Mr. Prasad R V, Program Manager Philips Research India, Bangalore, Karnataka, and Mr. Shrinidhi G. C., Chief Physicist, Dept. of Radiotherapy, KMC, Manipal for their help and support in the conduct of this research.
Funding Information:
This study was funded by the Manipal Academy of Higher Education, Karnataka, India, and Philips Research India.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: Radiotherapy is frequently used to treat head and neck Squamous cell carcinomas (HNSCC). Treatment outcomes being highly uncertain, there is a significant need for robust predictive tools to improvise treatment decision-making and better understand HNSCC by recognizing hidden patterns in data. We conducted this study to identify if Machine Learning (ML) could accurately predict outcomes and identify new prognostic variables in HNSCC. Method: Retrospective data of 311 HNSCC patients treated with radiotherapy between 2013 and 2018 at our center and having a follow-up of at least three months' duration were collected. Binary-classification prediction models were developed for: Choice of Initial Treatment, Residual disease, Locoregional Recurrence, Distant Recurrence, and Development of New Primary. Clinical data were pre-processed using Imputation, Feature selection, Minority Oversampling, and Feature scaling algorithms. A method to retain original characteristics of dataset in testing samples while performing minority oversampling is illustrated. The classification comparison was performed using Random Forest (RF), Kernel Support Vector Machine (KSVM), and XGBoost classification algorithms for each model. Results: For the choice of the initial treatment model, the testing accuracy was 84.58% using RF. The distant recurrence, locoregional recurrence, new-primary, and residual models had a testing accuracy (using KSVM) of 95.12%, 77.55%, 98.61%, and 92.25%, respectively. The important clinical determinants were identified using Shapely Values for each classification model, and the mean area under the curve (AUC) for the receiver operating curve was plotted. Conclusion: ML was able to predict several clinically relevant outcomes, and with additional clinical validation, could facilitate recognition of novel prognostic factors in HNSCC.
AB - Background: Radiotherapy is frequently used to treat head and neck Squamous cell carcinomas (HNSCC). Treatment outcomes being highly uncertain, there is a significant need for robust predictive tools to improvise treatment decision-making and better understand HNSCC by recognizing hidden patterns in data. We conducted this study to identify if Machine Learning (ML) could accurately predict outcomes and identify new prognostic variables in HNSCC. Method: Retrospective data of 311 HNSCC patients treated with radiotherapy between 2013 and 2018 at our center and having a follow-up of at least three months' duration were collected. Binary-classification prediction models were developed for: Choice of Initial Treatment, Residual disease, Locoregional Recurrence, Distant Recurrence, and Development of New Primary. Clinical data were pre-processed using Imputation, Feature selection, Minority Oversampling, and Feature scaling algorithms. A method to retain original characteristics of dataset in testing samples while performing minority oversampling is illustrated. The classification comparison was performed using Random Forest (RF), Kernel Support Vector Machine (KSVM), and XGBoost classification algorithms for each model. Results: For the choice of the initial treatment model, the testing accuracy was 84.58% using RF. The distant recurrence, locoregional recurrence, new-primary, and residual models had a testing accuracy (using KSVM) of 95.12%, 77.55%, 98.61%, and 92.25%, respectively. The important clinical determinants were identified using Shapely Values for each classification model, and the mean area under the curve (AUC) for the receiver operating curve was plotted. Conclusion: ML was able to predict several clinically relevant outcomes, and with additional clinical validation, could facilitate recognition of novel prognostic factors in HNSCC.
UR - http://www.scopus.com/inward/record.url?scp=85125527773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125527773&partnerID=8YFLogxK
U2 - 10.1186/s40537-022-00578-3
DO - 10.1186/s40537-022-00578-3
M3 - Article
AN - SCOPUS:85125527773
SN - 2196-1115
VL - 9
JO - Journal of Big Data
JF - Journal of Big Data
IS - 1
M1 - 25
ER -