TY - JOUR
T1 - Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome
AU - Raghavendran, Anand Kumar
AU - Musunuri, Balaji
AU - Rajpurohit, Siddheesh
AU - Ganesh Pai, C.
AU - Shetty, Shiran
AU - Kumari, Pretty
AU - Shetty, Rakshand
AU - Shetty, Athish
AU - Bhat, Ganesh
N1 - Publisher Copyright:
© Indian Society of Gastroenterology 2025.
PY - 2025
Y1 - 2025
N2 - Background: Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)—ChatGPT-4 and Gemini-1—for their performance in addressing IBS-related patient queries. Methods: Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers. Results: Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in “symptoms and diagnosis” and “treatment”, while mixed responses were most frequent in “general understanding” and “lifestyle”. There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini’s FRE score was 35.83 ± 3.31 vs. ChatGPT’s 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT’s responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%). Conclusion: While ChatGPT and Gemini provided extensive information, their limitations—such as complex language and occasional inaccuracies—must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.
AB - Background: Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)—ChatGPT-4 and Gemini-1—for their performance in addressing IBS-related patient queries. Methods: Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers. Results: Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in “symptoms and diagnosis” and “treatment”, while mixed responses were most frequent in “general understanding” and “lifestyle”. There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini’s FRE score was 35.83 ± 3.31 vs. ChatGPT’s 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT’s responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%). Conclusion: While ChatGPT and Gemini provided extensive information, their limitations—such as complex language and occasional inaccuracies—must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.
UR - https://www.scopus.com/pages/publications/105016746610
UR - https://www.scopus.com/pages/publications/105016746610#tab=citedBy
U2 - 10.1007/s12664-025-01872-7
DO - 10.1007/s12664-025-01872-7
M3 - Article
C2 - 40974448
AN - SCOPUS:105016746610
SN - 0254-8860
JO - Indian Journal of Gastroenterology
JF - Indian Journal of Gastroenterology
ER -