Spoken language identification using bidirectional lstm based lid sequential senones

H. Muralikrishna, Pulkit Sapra, Anuksha Jain, Dileep Aroor Dinesh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

The effectiveness of features used to represent speech utterances influences the performance of spoken language identification (LID) systems. Recent LID systems use bottleneck features (BNFs) obtained from deep neural networks (DNNs) to represent the utterances. These BNFs do not encode language-specific features. The recent advances in DNNs have led to the usage of effective language-sensitive features such as LID-senones, obtained using convolutional neural network (CNN) based architecture. In this work, we propose a novel approach to obtain LID-senones. The proposed approach combines BNF with bidirectional long short-Term memory (BLSTM) networks to generate LID-senones. Since each LID-senones preserve sequence information, we term it as LID-sequential-senones (LID-seq-senones). The proposed LID-seq-senones are then used for LID in two ways. In the first approach, we propose to build an end-To-end structure with BLSTM as front end LID-seq-senones extractor followed by a fully connected classification layer. In the second approach, we consider each utterance as a sequence of LID-seq-senones and propose to use support vector machine (SVM) with sequence kernel (GMM-based segment level pyramid match kernel) to classify the utterance. The effectiveness of proposed representation is evaluated on Oregon graduate institute multi-language telephone speech corpus (OGI-TS) and IIT Madras Indian language corpus (IITM-IL).

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages320-326
Number of pages7
ISBN (Electronic)9781728103068
DOIs
Publication statusPublished - 12-2019
Event2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
Duration: 15-12-201918-12-2019

Publication series

Name2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Country/TerritorySingapore
CitySingapore
Period15-12-1918-12-19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Signal Processing
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'Spoken language identification using bidirectional lstm based lid sequential senones'. Together they form a unique fingerprint.

Cite this