Skip to main navigation Skip to search Skip to main content

ADDRESSING DATA SCARCITY IN VOICE DISORDER DETECTION WITH SELF-SUPERVISED MODELS

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Machine learning (ML) has shown promising results in the field of voice disorder detection over the past decade. However, the diversity of recording conditions, audio content, languages, and the scarcity of examples for each of these combinations pose a challenge in building ML models that can reliably detect voice disorders. Recent advancements in Self-Supervised Learning (SSL) offer hope by leveraging large datasets to pretrain models and extract audio features with high resilience for downstream tasks. In this paper, we fairly exhaustively explore commonly used SSL model representations to assess their suitability for addressing the downstream task of voice disorder detection. Using a combination of Support Vector Machines (SVM) and feedforward Deep Neural Networks (DNN) we show: i) that the combination of vowels/a/,/i/, and/u/perform better than individual vowels; ii) SSL-based features generalize well to out-of-domain databases, and iii) that while spectral features like MFCC perform equally well compared to SSL-based features when trained and tested on the same database, performances seems to deteriorate when training and testing across different databases.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages11866-11870
Number of pages5
ISBN (Electronic)9798350344851
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14-04-202419-04-2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14-04-2419-04-24

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'ADDRESSING DATA SCARCITY IN VOICE DISORDER DETECTION WITH SELF-SUPERVISED MODELS'. Together they form a unique fingerprint.

Cite this