Skip to main navigation Skip to search Skip to main content

A Comprehensive Review of Recent Advances in Multimodal Multimedia Indexing and Retrieval

Research output: Contribution to journalReview articlepeer-review

Abstract

With the exponential growth of digital data across several domains, multimedia retrieval has emerged as a very critical research area. While much development has been witnessed in feature extraction, indexing methods, and deep learning-based retrieval models, many challenges persistently hamper the creation of efficient and scalable multimedia retrieval systems. This review systematically analyzes recent research in multimedia retrieval, highlighting key methodologies, findings, and limitations across various studies. The major gaps identified include the semantic disparity between low-level features and high-level semantics, as well as challenges of large datasets, privacy and security concerns, and the explainability of deep learning models. Besides, challenges on noisy and imbalanced data, multimodal data sources, and a lack of standardized benchmarking frameworks further limit the performance of existing systems. This paper comprehensively presents these gaps and proposes future research directions to bridge them, thereby directing the development of more robust, scalable, and user-centered multimedia retrieval systems. To overcome persistent challenges in versatility, semantic alignment, and trustworthiness, we introduce a unified framework that seamlessly integrates multimodal semantic fusion, explainable AI, privacy-preserving learning, and continual adaptation for robust and future-ready multimedia retrieval.

Original languageEnglish
Pages (from-to)143688-143712
Number of pages25
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'A Comprehensive Review of Recent Advances in Multimodal Multimedia Indexing and Retrieval'. Together they form a unique fingerprint.

Cite this