A Novel Stacked Model for Classification of Vocal Cord Paralysis over Imbalanced Vocal Data

K. Jayashree Hegde*, K. Manjula Shenoy*, K. Devaraja

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model's robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew's Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.

Original languageEnglish
Pages (from-to)10559-10581
Number of pages23
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'A Novel Stacked Model for Classification of Vocal Cord Paralysis over Imbalanced Vocal Data'. Together they form a unique fingerprint.

Cite this