Enhancing Visual Question Answering for Multiple Choice Questions

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The proposed paper examines enhancements in Visual Question Answering (VQA) by systematically tuning hyperparameters and utilizing advanced image and text encoders. The study particularly explores the adaptation of these models to Multiple-Choice Question (MCQ) formats, aiming to refine their accuracy and applicability. MCQs consist of a question stem along with a set of options, from which the correct answer, the key needs to be identified among the distractors. Using MCQs provides the model with some context of the correct answer, improving its performance over a simple multiclass classification task. The research showcases the effectiveness of precise hyperparameter adjustments in improving the performance of VQA systems, through comparative analysis of varied sets of hyperparameters, highlighting their improved reasoning capabilities across various datasets, including samples from real world images and academic questions. This demonstrates the potential of VQA models for robust application in both educational and practical scenarios.

Original languageEnglish
Pages (from-to)93453-93467
Number of pages15
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Enhancing Visual Question Answering for Multiple Choice Questions'. Together they form a unique fingerprint.

Cite this