Skip to main navigation Skip to search Skip to main content

Abstract

Quick detection of bacteria is essential to control the infection growth to avoid the severity of infectious diseases. Automation in the microscopic analysis of direct smear of clinical samples would help to detect the bacteria quickly. Deep learning-based classification models have gained popularity for Computer Aided Diagnostic systems. However, the limited availability of the bacteria image dataset can be considered an important bottleneck for the development of deep learning model for bacteria detection. Pre-trained self-attention based vision transformers have gained popularity in image classification problems because of their strong transfer learning capabilities. They are reported to be performing equally well with the limited number of images for training. We propose a vision transformer-based classifier for classifying the bacteria and neutrophils from Gram-stained direct smear images. The proposed classification model is evaluated on both private and public datasets. Transfer learning from a pre-trained transformer shows improvement in the performance metrics compared to pre-trained CNNs, namely VGG16, EfficientNetB0 and benchmark software Microsoft Lobe. In addition, the different variants of the transformer are compared. The proposed classification model using ViT-B_16 achieved the highest accuracy of 96%.

Original languageEnglish
Article numbere0184554
Pages (from-to)20289-20309
Number of pages21
JournalMultimedia Tools and Applications
Volume84
Issue number19
DOIs
Publication statusPublished - 06-2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

All Science Journal Classification (ASJC) codes

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Vision transformer based bacteria classification model for Gram-stained direct smear images'. Together they form a unique fingerprint.

Cite this