TY - GEN
T1 - Adversarially Trained Hierarchical Attention Network for Domain-Invariant Spoken Language Identification
AU - Goswami, Urvashi
AU - Muralikrishna, H.
AU - Dileep, A. D.
AU - Thenkanidiyoor, Veena
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - State-of-the-art spoken language identification (LID) systems are sensitive to domain-mismatch between training and testing samples, due to which, they often perform unsatisfactorily in unseen target domain conditions. In order to improve the performance in domain-mismatched conditions, the LID system should be encouraged to learn domain-invariant representation of the speech. In this paper, we propose an adversarially trained hierarchical attention network for achieving this. Specifically, the proposed method first uses a transformer-encoder which uses attention mechanism at three different-levels to learn better representations at segment-level, suprasegmental-level and utterance-level. Such hierarchical attention mechanism allows the network to encode LID-specific contents of the speech in a better way. The network is then encouraged to learn domain-invariant representation of the speech using adversarial multi-task learning (AMTL). Results obtained on unseen target domain conditions demonstrate the superiority of proposed approach over state-of-the-art baselines.
AB - State-of-the-art spoken language identification (LID) systems are sensitive to domain-mismatch between training and testing samples, due to which, they often perform unsatisfactorily in unseen target domain conditions. In order to improve the performance in domain-mismatched conditions, the LID system should be encouraged to learn domain-invariant representation of the speech. In this paper, we propose an adversarially trained hierarchical attention network for achieving this. Specifically, the proposed method first uses a transformer-encoder which uses attention mechanism at three different-levels to learn better representations at segment-level, suprasegmental-level and utterance-level. Such hierarchical attention mechanism allows the network to encode LID-specific contents of the speech in a better way. The network is then encouraged to learn domain-invariant representation of the speech using adversarial multi-task learning (AMTL). Results obtained on unseen target domain conditions demonstrate the superiority of proposed approach over state-of-the-art baselines.
UR - https://www.scopus.com/pages/publications/85178601701
UR - https://www.scopus.com/inward/citedby.url?scp=85178601701&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-48312-7_38
DO - 10.1007/978-3-031-48312-7_38
M3 - Conference contribution
AN - SCOPUS:85178601701
SN - 9783031483110
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 475
EP - 489
BT - Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings
A2 - Karpov, Alexey
A2 - Samudravijaya, K.
A2 - Deepak, K. T.
A2 - Hegde, Rajesh M.
A2 - Prasanna, S. R. Mahadeva
A2 - Agrawal, Shyam S.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Speech and Computer, SPECOM 2023
Y2 - 29 November 2023 through 2 December 2023
ER -