TY - GEN
T1 - Transformer-based Bird vs Drone Classification for Anti-drone System
T2 - 2025 IEEE International Conference on Emerging Technologies in Autonomous Aerial Vehicles, ETAAV 2025
AU - Bangde, Sanskruti
AU - Verma, Sourabh
AU - Gupta, Himanshu
AU - Verma, Om Prakash
AU - Khosla, Arun K.
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Anti-Drone systems for a long time have been using different architectures to detect, identify, and track drones. Although it has been a great system, it has also seen an increased rate of false positives. Transformer-based architectures like Vision Transformer (ViT) and Swin Transformer (Shifted Window Transformer) offer promising solutions through their self-attention mechanisms. The study aims to evaluate these two state-of-the-art deep learning models' performance, efficiency, and accuracy by employing a comprehensive dataset of diverse bird and drone images. The paper analyzes various metrics such as accuracy, precision, recall, and F1-score through rigorous experimentation, providing insights into their applicability inthe real world. Preliminary findings indicate that the ViT-based framework achieves a training accuracy of 99.16%. In comparison, the Swin Transformer, with its hierarchical feature extraction and shifted window mechanism, excels in handling complex backgrounds has training accuracy of 99.41%. TheViT, as well as the Swin Transformer, demonstrates high precision and recall across all classes, including drones and birds.
AB - Anti-Drone systems for a long time have been using different architectures to detect, identify, and track drones. Although it has been a great system, it has also seen an increased rate of false positives. Transformer-based architectures like Vision Transformer (ViT) and Swin Transformer (Shifted Window Transformer) offer promising solutions through their self-attention mechanisms. The study aims to evaluate these two state-of-the-art deep learning models' performance, efficiency, and accuracy by employing a comprehensive dataset of diverse bird and drone images. The paper analyzes various metrics such as accuracy, precision, recall, and F1-score through rigorous experimentation, providing insights into their applicability inthe real world. Preliminary findings indicate that the ViT-based framework achieves a training accuracy of 99.16%. In comparison, the Swin Transformer, with its hierarchical feature extraction and shifted window mechanism, excels in handling complex backgrounds has training accuracy of 99.41%. TheViT, as well as the Swin Transformer, demonstrates high precision and recall across all classes, including drones and birds.
UR - https://www.scopus.com/pages/publications/105024918085
UR - https://www.scopus.com/pages/publications/105024918085#tab=citedBy
U2 - 10.1109/ETAAV66793.2025.11213014
DO - 10.1109/ETAAV66793.2025.11213014
M3 - Conference contribution
AN - SCOPUS:105024918085
T3 - 2025 IEEE International Conference on Emerging Technologies in Autonomous Aerial Vehicles, ETAAV 2025 - Proceedings
BT - 2025 IEEE International Conference on Emerging Technologies in Autonomous Aerial Vehicles, ETAAV 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 August 2025 through 20 August 2025
ER -