Chain-of-Thought Reasoning Evaluation Framework for Question Answering System

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Question-answering (QA) systems have become crucial in healthcare, decision-making, finance, and research domains. While transformer-based language models have greatly enhanced the generation of contextually relevant answers, they often struggle to provide reliable answers, leading to hallucinations and incorrect predictions. Traditional evaluation metrics, such as exact match, precision, recall, and F1 metric, demonstrate significant limitations in comprehensively assessing the performance and capabilities of QA systems. This research introduces a commonsense reasoning-based evaluation framework for language models, leveraging the Chain-of-Thought (CoT) reasoning approach. Our proposed method integrates CoT reasoning with a GPT o1-mini model to evaluate the predictions from language models. Experiments conducted on the SQuAD 2.0 dataset demonstrate significant improvements in handling the answers generated by the language models and increase the reliability of the QA system. The framework offers robust and interpretable solutions, addressing critical gaps in current QA systems and ensuring reliability and transparency for practical, real-world applications.

Original languageEnglish
Title of host publication2025 International Conference on Artificial Intelligence and Data Engineering, AIDE 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages725-730
Number of pages6
ISBN (Electronic)9798331527518
DOIs
Publication statusPublished - 2025
Event2025 International Conference on Artificial Intelligence and Data Engineering, AIDE 2025 - Nitte, India
Duration: 06-02-202507-02-2025

Publication series

Name2025 International Conference on Artificial Intelligence and Data Engineering, AIDE 2025 - Proceedings

Conference

Conference2025 International Conference on Artificial Intelligence and Data Engineering, AIDE 2025
Country/TerritoryIndia
CityNitte
Period06-02-2507-02-25

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Statistics, Probability and Uncertainty
  • Instrumentation

Fingerprint

Dive into the research topics of 'Chain-of-Thought Reasoning Evaluation Framework for Question Answering System'. Together they form a unique fingerprint.

Cite this