TY - GEN
T1 - Analysis of Frequencies and Pitches for Vocal Cord Paralysis Classification Through Transfer Learning
AU - Jayashree Hegde, K.
AU - Manjula Shenoy, K.
AU - Devaraja, K.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Audio-based examination for Vocal Cord Paralysis (VCP) is a useful, non-invasive, cost-effective procedure, and Spectrogram analysis is one such boon for the classification of VCP versus Healthy subjects. However, spectrograms have not been utilized fully from their representation for classification purposes. This research aims to investigate spectrograms at various frequencies over different pitches to aid in accurately classifying VCP and healthy subjects. A voice disorders database named Saarbruecken Voice Database (SVD) contains audios of VCP and healthy subjects are collected. Using Transfer Learning (TL) and fine-tuning, Deep Convolutional Neural Networks (DCNNs) and Vision Transformers (ViTs) architectures are trained over the vowel /a/, and the derived characteristics are given for classification, yielding outstanding 100% accuracy. With minimal trainable parameters, the Deep Learning (DL) architectures performed exceptionally well, indicating the computational capability of the spectrograms in terms of robustness for pathological practice in the future.
AB - Audio-based examination for Vocal Cord Paralysis (VCP) is a useful, non-invasive, cost-effective procedure, and Spectrogram analysis is one such boon for the classification of VCP versus Healthy subjects. However, spectrograms have not been utilized fully from their representation for classification purposes. This research aims to investigate spectrograms at various frequencies over different pitches to aid in accurately classifying VCP and healthy subjects. A voice disorders database named Saarbruecken Voice Database (SVD) contains audios of VCP and healthy subjects are collected. Using Transfer Learning (TL) and fine-tuning, Deep Convolutional Neural Networks (DCNNs) and Vision Transformers (ViTs) architectures are trained over the vowel /a/, and the derived characteristics are given for classification, yielding outstanding 100% accuracy. With minimal trainable parameters, the Deep Learning (DL) architectures performed exceptionally well, indicating the computational capability of the spectrograms in terms of robustness for pathological practice in the future.
UR - https://www.scopus.com/pages/publications/85205781910
UR - https://www.scopus.com/pages/publications/85205781910#tab=citedBy
U2 - 10.1109/CONECCT62155.2024.10677163
DO - 10.1109/CONECCT62155.2024.10677163
M3 - Conference contribution
AN - SCOPUS:85205781910
T3 - Proceedings of CONECCT 2024 - 10th IEEE International Conference on Electronics, Computing and Communication Technologies
BT - Proceedings of CONECCT 2024 - 10th IEEE International Conference on Electronics, Computing and Communication Technologies
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2024
Y2 - 12 July 2024 through 14 July 2024
ER -