Adversarially Trained Hierarchical Attention Network for Domain-Invariant Spoken Language Identification

Urvashi Goswami, H. Muralikrishna, A. D. Dileep*, Veena Thenkanidiyoor

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

State-of-the-art spoken language identification (LID) systems are sensitive to domain-mismatch between training and testing samples, due to which, they often perform unsatisfactorily in unseen target domain conditions. In order to improve the performance in domain-mismatched conditions, the LID system should be encouraged to learn domain-invariant representation of the speech. In this paper, we propose an adversarially trained hierarchical attention network for achieving this. Specifically, the proposed method first uses a transformer-encoder which uses attention mechanism at three different-levels to learn better representations at segment-level, suprasegmental-level and utterance-level. Such hierarchical attention mechanism allows the network to encode LID-specific contents of the speech in a better way. The network is then encouraged to learn domain-invariant representation of the speech using adversarial multi-task learning (AMTL). Results obtained on unseen target domain conditions demonstrate the superiority of proposed approach over state-of-the-art baselines.

Original languageEnglish
Title of host publicationSpeech and Computer - 25th International Conference, SPECOM 2023, Proceedings
EditorsAlexey Karpov, K. Samudravijaya, K. T. Deepak, Rajesh M. Hegde, S. R. Mahadeva Prasanna, Shyam S. Agrawal
PublisherSpringer Science and Business Media Deutschland GmbH
Pages475-489
Number of pages15
ISBN (Print)9783031483110
DOIs
Publication statusPublished - 2023
Event25th International Conference on Speech and Computer, SPECOM 2023 - Dharwad, India
Duration: 29-11-202302-12-2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14339 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Speech and Computer, SPECOM 2023
Country/TerritoryIndia
CityDharwad
Period29-11-2302-12-23

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Adversarially Trained Hierarchical Attention Network for Domain-Invariant Spoken Language Identification'. Together they form a unique fingerprint.

Cite this