Abstract
State-of-the-art spoken language identification (LID) systems use sophisticated training strategies to improve the robustness to unseen channel conditions in the real-world test samples. However, all these approaches require training samples from multiple channels with corresponding channel-labels, which is not available in many cases. Recent research in this regard has shown the possibility of learning a channel-invariant representation of the speech using an auxiliary loss function called within-sample similarity loss (WSSL), which does not require samples from multiple channels. Specifically, the WSSL encourages the LID network to ignore channel-specific contents in the speech by minimizing the similarities between two utterance-level embeddings of same sample. However, as WSSL approach operates at sample-level, it ignores the channel variations that may be present across different training samples within same dataset. In this work, we propose a modification to the WSSL approach to address this limitation. Specifically, along with the WSSL, the proposed modified WSSL (mWSSL) approach additionally considers the similarities with two global-level embeddings which represent the average channel-specific contents in a given mini-batch of training samples. The proposed modification allows the network to have a better view of the channel-specific contents in the training dataset, leading to improved performance in unseen channel conditions.
| Original language | English |
|---|---|
| Pages (from-to) | 16-23 |
| Number of pages | 8 |
| Journal | Pattern Recognition Letters |
| Volume | 158 |
| DOIs | |
| Publication status | Published - 06-2022 |
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Spoken language identification in unseen channel conditions using modified within-sample similarity loss'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver