TY - JOUR
T1 - A Novel Stacked Model for Classification of Vocal Cord Paralysis over Imbalanced Vocal Data
AU - Jayashree Hegde, K.
AU - Manjula Shenoy, K.
AU - Devaraja, K.
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model's robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew's Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.
AB - Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model's robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew's Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.
UR - http://www.scopus.com/inward/record.url?scp=85215868357&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215868357&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3525721
DO - 10.1109/ACCESS.2025.3525721
M3 - Article
AN - SCOPUS:85215868357
SN - 2169-3536
VL - 13
SP - 10559
EP - 10581
JO - IEEE Access
JF - IEEE Access
ER -