Skip to main navigation Skip to search Skip to main content

Spoken language identification in unseen channel conditions using modified within-sample similarity loss

    Research output: Contribution to journalArticlepeer-review

    Abstract

    State-of-the-art spoken language identification (LID) systems use sophisticated training strategies to improve the robustness to unseen channel conditions in the real-world test samples. However, all these approaches require training samples from multiple channels with corresponding channel-labels, which is not available in many cases. Recent research in this regard has shown the possibility of learning a channel-invariant representation of the speech using an auxiliary loss function called within-sample similarity loss (WSSL), which does not require samples from multiple channels. Specifically, the WSSL encourages the LID network to ignore channel-specific contents in the speech by minimizing the similarities between two utterance-level embeddings of same sample. However, as WSSL approach operates at sample-level, it ignores the channel variations that may be present across different training samples within same dataset. In this work, we propose a modification to the WSSL approach to address this limitation. Specifically, along with the WSSL, the proposed modified WSSL (mWSSL) approach additionally considers the similarities with two global-level embeddings which represent the average channel-specific contents in a given mini-batch of training samples. The proposed modification allows the network to have a better view of the channel-specific contents in the training dataset, leading to improved performance in unseen channel conditions.

    Original languageEnglish
    Pages (from-to)16-23
    Number of pages8
    JournalPattern Recognition Letters
    Volume158
    DOIs
    Publication statusPublished - 06-2022

    All Science Journal Classification (ASJC) codes

    • Software
    • Signal Processing
    • Computer Vision and Pattern Recognition
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Spoken language identification in unseen channel conditions using modified within-sample similarity loss'. Together they form a unique fingerprint.

    Cite this