TY - GEN
T1 - Hierarchical Attention Mechanism for Domain-invariant Spoken Language Identification via Inter-domain Alignment and Intra-domain Language Discrimination
AU - Goswami, Urvashi
AU - Muralikrishna, H.
AU - Kumar, Sujeet
AU - Dileep, A. D.
AU - Thenkanidiyoor, Veena
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Modern methods for spoken language identification (LID) have demonstrated promising results when trained on large datasets. However, their effectiveness is considerably impacted by the discrepancies between the training and testing distributions. Unsupervised domain adaptation (UDA) addresses the domain discrepancies by aligning feature distributions across source and target domains without using labeled target data, thereby ensuring the model performs consistently across both domains. This alignment, however, may come at the cost of reduced model discriminability. To enhance performance under domainmismatched conditions, a LID system should learn domaininvariant representations of speech while maintaining language discriminability at every level, from initial segment-level representations to final utterance-level representations. To address this issue, in this paper, we propose a hierarchical attention network that simultaneously enhances model domain invariance through inter-domain alignment and improves model's language discriminability using intra-domain contrastive discrimination. The performance on the target domain shows that the proposed method exceeds the effectiveness of the top-performing existing approaches.
AB - Modern methods for spoken language identification (LID) have demonstrated promising results when trained on large datasets. However, their effectiveness is considerably impacted by the discrepancies between the training and testing distributions. Unsupervised domain adaptation (UDA) addresses the domain discrepancies by aligning feature distributions across source and target domains without using labeled target data, thereby ensuring the model performs consistently across both domains. This alignment, however, may come at the cost of reduced model discriminability. To enhance performance under domainmismatched conditions, a LID system should learn domaininvariant representations of speech while maintaining language discriminability at every level, from initial segment-level representations to final utterance-level representations. To address this issue, in this paper, we propose a hierarchical attention network that simultaneously enhances model domain invariance through inter-domain alignment and improves model's language discriminability using intra-domain contrastive discrimination. The performance on the target domain shows that the proposed method exceeds the effectiveness of the top-performing existing approaches.
UR - https://www.scopus.com/pages/publications/105001491066
UR - https://www.scopus.com/pages/publications/105001491066#tab=citedBy
U2 - 10.1109/ICEI64305.2024.10912305
DO - 10.1109/ICEI64305.2024.10912305
M3 - Conference contribution
AN - SCOPUS:105001491066
T3 - 2024 IEEE Conference on Engineering Informatics, ICEI 2024
BT - 2024 IEEE Conference on Engineering Informatics, ICEI 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE Conference on Engineering Informatics, ICEI 2024
Y2 - 20 November 2024 through 28 November 2024
ER -