TY - GEN
T1 - Survival Analysis for Cancers of the Brain, CNS and Bone using Retrieval Augmented Generation on the SEER Database
AU - Vaidyanathan, Jyothi
AU - Gupta, Shourya
AU - Lee, Justin
AU - Prabhu, Srikanth
AU - Sengupta, Saptarshi
N1 - Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2025/5/28
Y1 - 2025/5/28
N2 - Mortality estimation remains a key issue in cancers affecting the Brain, Central Nervous System (CNS), and Bone, among others. The recent integration of LLM-based reasoning into tools that aid cancer prognosis has been particularly encouraging. This prompts us to examine further their stated efficacy and devise workarounds to reduce hallucinations using retrieval augmented generation. We study the clinical, pathological and demographic logs of patients recorded in the National Institutes of Health (NIH) Surveillance, Epidemiology, and End Results (SEER) database and develop an integrated methodology that is user-friendly and responds to n-shot queries with or without context. We first build a set of custom SEER embeddings using DistilBERT, which we use to test tree-based models in answering 'yes/no' type 5-year survivability questions given patient profiles. We extend the limited binary response capability of the prior models by using TabLLM, HyDE-RAG, and Step-Back RAG on the BCNS cancer data and extend them to Bone Cancer data from SEER using GraphRAG, as the attributes are similar. The conversation-friendly models are able to take different context lengths and types into account and provide reasoning about their responses. We successfully show that the extensive patient records in the SEER database can be utilized to develop a powerful conversational agent that is not only able to classify mortality outcomes but also reason about the response by leveraging latent inter-relationships among the unique clinical variables.
AB - Mortality estimation remains a key issue in cancers affecting the Brain, Central Nervous System (CNS), and Bone, among others. The recent integration of LLM-based reasoning into tools that aid cancer prognosis has been particularly encouraging. This prompts us to examine further their stated efficacy and devise workarounds to reduce hallucinations using retrieval augmented generation. We study the clinical, pathological and demographic logs of patients recorded in the National Institutes of Health (NIH) Surveillance, Epidemiology, and End Results (SEER) database and develop an integrated methodology that is user-friendly and responds to n-shot queries with or without context. We first build a set of custom SEER embeddings using DistilBERT, which we use to test tree-based models in answering 'yes/no' type 5-year survivability questions given patient profiles. We extend the limited binary response capability of the prior models by using TabLLM, HyDE-RAG, and Step-Back RAG on the BCNS cancer data and extend them to Bone Cancer data from SEER using GraphRAG, as the attributes are similar. The conversation-friendly models are able to take different context lengths and types into account and provide reasoning about their responses. We successfully show that the extensive patient records in the SEER database can be utilized to develop a powerful conversational agent that is not only able to classify mortality outcomes but also reason about the response by leveraging latent inter-relationships among the unique clinical variables.
UR - https://www.scopus.com/pages/publications/105016694824
UR - https://www.scopus.com/pages/publications/105016694824#tab=citedBy
U2 - 10.1609/aaaiss.v5i1.35549
DO - 10.1609/aaaiss.v5i1.35549
M3 - Conference contribution
AN - SCOPUS:105016694824
T3 - AAAI Spring Symposium - Technical Report
SP - 31
EP - 36
BT - AAAI Spring Symposium - Technical Report
A2 - Petrick, Ron
A2 - Geib, Christopher
PB - Association for the Advancement of Artificial Intelligence
T2 - 2025 AAAI Spring Symposium Series, SSS 2025
Y2 - 31 March 2025 through 2 April 2025
ER -