Skip to main navigation Skip to search Skip to main content

Efficient Romanian Dialect Identification in Low-resource conditions using Transfer-Learning and Metric-Learning

Research output: Contribution to journalArticlepeer-review

Abstract

Automatic spoken dialect identification (DID) is a challenging task as dialects of a given language have high similarities. Lack of sufficient training data to train deep learning (DL) based DID systems, which is common for low-resource languages like Romanian, makes it further challenging. While transfer-learning can be used to improve the performance in such low-resource conditions, it is not a trivial task to select a pre-trained model and a suitable layer for DID-specific feature extraction, as the ability to encode DID-specific content vary largely across models and their layers. Furthermore, due to high inter-class similarities, the features obtained from such pre-trained models may contain only limited dialect-discriminability, necessitating explicit methods to improve the discriminability of the learned features. Motivated by all these, we propose different methods to build efficient DID systems for identifying Romanian dialects, in low-resource settings. Specifically, we first explore pre-trained models like wav2vec2, HuBERT and XEUS for Romanian DID along with determining their optimum layer for this task. Furthermore, to address the performance degradation due to high inter-class similarities and intra-class variations, we employ metric learning techniques like center loss (CL) and centroid similarity loss (CSL). Obtained results indicate that DID system trained using 4th layer of XEUS model, when combined with CSL gives the best performance on Romanian DID.

Original languageEnglish
Pages (from-to)204467-204479
Number of pages13
JournalIEEE Access
Volume13
DOIs
Publication statusAccepted/In press - 2025

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Efficient Romanian Dialect Identification in Low-resource conditions using Transfer-Learning and Metric-Learning'. Together they form a unique fingerprint.

Cite this