TY - GEN
T1 - Hierarchical Loss for Bi-Level Classification of Speech into Language and Dialects
AU - Angra, Ananya
AU - Muralikrishna, H.
AU - Dileep, A. D.
AU - Thenkanidiyoor, Veena
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Spoken dialect identification (DID) is a challenging task due to high similarities between the classes. This becomes further complicated when DID needs to be performed in a multilingual environment, especially when languages are from same language-family, as the scope for confusion is significantly high. In this paper, we propose multiple techniques to address this issue. These techniques motivate the DID system to respect the parent-child relationship between the languages and their dialects by performing bi-level classification of speech. Specifically, in our first approach, we train the language identification (LID) and DID models independently and then combine them together such that the output of LID model selects the DID model, which in turn decides the dialect. However, such independent training does not allow the model to learn the relations/similarities between dialects of different languages. To overcome this limitation, we propose to train an end-to-end DID model which is trained using dialects of all languages in the dataset. While such end-to-end training allows the model to learn inter-dialect similarities in a better way, it does not explicitly prevent a dialect from being misclassified into a dialect of a different language. To address this limitation, we propose a novel hierarchical loss, which motivates the model to maintain the parent-child relationship between language and dialects. Specifically, with the help of an auxiliary language classifier and a primary dialect classifier, the hierarchical loss penalizes the model heavily whenever parent-child relationship is not maintained, i.e., when predicted dialect does not belong to the predicted language. Experiments conducted on a set of closely related Indian languages shows that hierarchical loss based training leads to improvement in the performance.
AB - Spoken dialect identification (DID) is a challenging task due to high similarities between the classes. This becomes further complicated when DID needs to be performed in a multilingual environment, especially when languages are from same language-family, as the scope for confusion is significantly high. In this paper, we propose multiple techniques to address this issue. These techniques motivate the DID system to respect the parent-child relationship between the languages and their dialects by performing bi-level classification of speech. Specifically, in our first approach, we train the language identification (LID) and DID models independently and then combine them together such that the output of LID model selects the DID model, which in turn decides the dialect. However, such independent training does not allow the model to learn the relations/similarities between dialects of different languages. To overcome this limitation, we propose to train an end-to-end DID model which is trained using dialects of all languages in the dataset. While such end-to-end training allows the model to learn inter-dialect similarities in a better way, it does not explicitly prevent a dialect from being misclassified into a dialect of a different language. To address this limitation, we propose a novel hierarchical loss, which motivates the model to maintain the parent-child relationship between language and dialects. Specifically, with the help of an auxiliary language classifier and a primary dialect classifier, the hierarchical loss penalizes the model heavily whenever parent-child relationship is not maintained, i.e., when predicted dialect does not belong to the predicted language. Experiments conducted on a set of closely related Indian languages shows that hierarchical loss based training leads to improvement in the performance.
UR - https://www.scopus.com/pages/publications/105003892340
UR - https://www.scopus.com/pages/publications/105003892340#tab=citedBy
U2 - 10.1109/ICASSP49660.2025.10887667
DO - 10.1109/ICASSP49660.2025.10887667
M3 - Conference contribution
AN - SCOPUS:105003892340
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
A2 - Rao, Bhaskar D
A2 - Trancoso, Isabel
A2 - Sharma, Gaurav
A2 - Mehta, Neelesh B.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -