Abstract
This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.
| Original language | English |
|---|---|
| Journal | CEUR Workshop Proceedings |
| Volume | 2380 |
| Publication status | Published - 01-01-2019 |
| Event | 20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Switzerland Duration: 09-09-2019 → 12-09-2019 |
All Science Journal Classification (ASJC) codes
- General Computer Science
Fingerprint
Dive into the research topics of 'MIT manipal at ImageCLef 2019 visual question answering in medical domain'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver