Image Captioning and Comparison of Different Encoders

Ankit Pal, Subasish Kar, Anuveksh Taneja, Vinay Kumar Jadoun

Research output: Contribution to journalConference articlepeer-review

2 Citations (Scopus)


Generation of a sentence given an image, called image captioning, has been one of the most intriguing topics in computer vision. It incorporates knowledge of both image processing and natural language processing. Most of the current approaches integrates the concepts of neural network. Different predefined convolutional neural network (CNN) models are used for extracting features from an image and uni-directional or bi-directional recurrent neural network (RNN) for language modelling. This paper discusses about the commonly used models that are used as image encoder, such as Inception-V3, VGG19, VGG16 and InceptionResNetV2 while using the uni-directional LSTMs for the text generation. Further, the comparative analysis of the result has been obtained using the Bilingual Evaluation Understudy (BLEU) score on the Flickr8k dataset.

Original languageEnglish
Article number012004
JournalJournal of Physics: Conference Series
Issue number1
Publication statusPublished - 13-05-2020
Event2019 International Conference on Future of Engineering Systems and Technologies, FEST 2019 - Greater Noida, India
Duration: 21-12-201922-12-2019

All Science Journal Classification (ASJC) codes

  • Physics and Astronomy(all)


Dive into the research topics of 'Image Captioning and Comparison of Different Encoders'. Together they form a unique fingerprint.

Cite this