Hierarchical Loss for Bi-Level Classification of Speech into Language and Dialects

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Spoken dialect identification (DID) is a challenging task due to high similarities between the classes. This becomes further complicated when DID needs to be performed in a multilingual environment, especially when languages are from same language-family, as the scope for confusion is significantly high. In this paper, we propose multiple techniques to address this issue. These techniques motivate the DID system to respect the parent-child relationship between the languages and their dialects by performing bi-level classification of speech. Specifically, in our first approach, we train the language identification (LID) and DID models independently and then combine them together such that the output of LID model selects the DID model, which in turn decides the dialect. However, such independent training does not allow the model to learn the relations/similarities between dialects of different languages. To overcome this limitation, we propose to train an end-to-end DID model which is trained using dialects of all languages in the dataset. While such end-to-end training allows the model to learn inter-dialect similarities in a better way, it does not explicitly prevent a dialect from being misclassified into a dialect of a different language. To address this limitation, we propose a novel hierarchical loss, which motivates the model to maintain the parent-child relationship between language and dialects. Specifically, with the help of an auxiliary language classifier and a primary dialect classifier, the hierarchical loss penalizes the model heavily whenever parent-child relationship is not maintained, i.e., when predicted dialect does not belong to the predicted language. Experiments conducted on a set of closely related Indian languages shows that hierarchical loss based training leads to improvement in the performance.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
Publication statusPublished - 2025
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 06-04-202511-04-2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period06-04-2511-04-25

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Hierarchical Loss for Bi-Level Classification of Speech into Language and Dialects'. Together they form a unique fingerprint.

Cite this