TY - JOUR
T1 - Enhancing Visual Question Answering for Multiple Choice Questions
AU - Goel, Rashi
AU - Nandwani, Harsh
AU - Shah, Eshaan
AU - Nayak, Ashalatha
AU - Kumar, Archana Praveen
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - The proposed paper examines enhancements in Visual Question Answering (VQA) by systematically tuning hyperparameters and utilizing advanced image and text encoders. The study particularly explores the adaptation of these models to Multiple-Choice Question (MCQ) formats, aiming to refine their accuracy and applicability. MCQs consist of a question stem along with a set of options, from which the correct answer, the key needs to be identified among the distractors. Using MCQs provides the model with some context of the correct answer, improving its performance over a simple multiclass classification task. The research showcases the effectiveness of precise hyperparameter adjustments in improving the performance of VQA systems, through comparative analysis of varied sets of hyperparameters, highlighting their improved reasoning capabilities across various datasets, including samples from real world images and academic questions. This demonstrates the potential of VQA models for robust application in both educational and practical scenarios.
AB - The proposed paper examines enhancements in Visual Question Answering (VQA) by systematically tuning hyperparameters and utilizing advanced image and text encoders. The study particularly explores the adaptation of these models to Multiple-Choice Question (MCQ) formats, aiming to refine their accuracy and applicability. MCQs consist of a question stem along with a set of options, from which the correct answer, the key needs to be identified among the distractors. Using MCQs provides the model with some context of the correct answer, improving its performance over a simple multiclass classification task. The research showcases the effectiveness of precise hyperparameter adjustments in improving the performance of VQA systems, through comparative analysis of varied sets of hyperparameters, highlighting their improved reasoning capabilities across various datasets, including samples from real world images and academic questions. This demonstrates the potential of VQA models for robust application in both educational and practical scenarios.
UR - https://www.scopus.com/pages/publications/105006667273
UR - https://www.scopus.com/pages/publications/105006667273#tab=citedBy
U2 - 10.1109/ACCESS.2025.3572529
DO - 10.1109/ACCESS.2025.3572529
M3 - Article
AN - SCOPUS:105006667273
SN - 2169-3536
VL - 13
SP - 93453
EP - 93467
JO - IEEE Access
JF - IEEE Access
ER -