Abstract
The increasing demand for precise and efficient disease diagnosis has put the spotlight on the limitations of traditional models in dealing with complex multimodal data. Current frameworks, such as CNNs and Transformers, fail to deliver optimal accuracy and often yield a high false-positive rate, which diminishes their reliability in critical clinical applications. In addressing these challenges, this study introduces the novel architecture of a Hierarchical Attention-Based Convolutional Transformer, or HACTNet, to integrate multimodal imaging and clinical metadata for superior diagnostic performance. The model utilises a hierarchical attention mechanism that can focus on the relevant features, thus improving interpretability and diagnostic accuracy. On a dataset of 10,000 annotated medical images, the model shows marked improvement in accuracy at 96.3%, F1-score of 0.94, and AUC-ROC of 0.98 as compared to CNN, ResNet, and Transformer models. It reduces false positives by 12%, a critical metric for clinical reliability. These improvements position HACTNet as a robust tool for the early detection of diseases and personalised treatment planning, poised to transform precision healthcare. Future work will be on the clinical validation and adaptation of the model in diverse populations.
| Original language | English |
|---|---|
| Article number | 20240067 |
| Journal | HKIE Transactions Hong Kong Institution of Engineers |
| Volume | 32 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 2025 |
All Science Journal Classification (ASJC) codes
- General Engineering
Fingerprint
Dive into the research topics of 'HACTNet: a hierarchical attention-based convolutional transformer network for enhanced multimodal medical imaging prediction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver