Skip to main navigation Skip to search Skip to main content

HACTNet: a hierarchical attention-based convolutional transformer network for enhanced multimodal medical imaging prediction

  • H. Anwar Basha*
  • , R. Deepak
  • , K. Thanuja
  • , Soumyalatha Naveen
  • , Vijayakumar Varadarajan
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The increasing demand for precise and efficient disease diagnosis has put the spotlight on the limitations of traditional models in dealing with complex multimodal data. Current frameworks, such as CNNs and Transformers, fail to deliver optimal accuracy and often yield a high false-positive rate, which diminishes their reliability in critical clinical applications. In addressing these challenges, this study introduces the novel architecture of a Hierarchical Attention-Based Convolutional Transformer, or HACTNet, to integrate multimodal imaging and clinical metadata for superior diagnostic performance. The model utilises a hierarchical attention mechanism that can focus on the relevant features, thus improving interpretability and diagnostic accuracy. On a dataset of 10,000 annotated medical images, the model shows marked improvement in accuracy at 96.3%, F1-score of 0.94, and AUC-ROC of 0.98 as compared to CNN, ResNet, and Transformer models. It reduces false positives by 12%, a critical metric for clinical reliability. These improvements position HACTNet as a robust tool for the early detection of diseases and personalised treatment planning, poised to transform precision healthcare. Future work will be on the clinical validation and adaptation of the model in diverse populations.

Original languageEnglish
Article number20240067
JournalHKIE Transactions Hong Kong Institution of Engineers
Volume32
Issue number1
DOIs
Publication statusPublished - 2025

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'HACTNet: a hierarchical attention-based convolutional transformer network for enhanced multimodal medical imaging prediction'. Together they form a unique fingerprint.

Cite this