Exploring the Impact of Different Approaches for Spoken Dialect Identification of Konkani Language

Sean Monteiro, Ananya Angra, Muralikrishna H, Veena Thenkanidiyoor, A. D. Dileep*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This work aims to identify dialects for Konkani language. In this work, various state-of-the-art methods in language identification are explored for the identification of dialects of the Konkani language. The initial base model is constructed using fully connected neural network which is trained on frame-level Mel-frequency cepstral coefficient (MFCC) features. This base model trained on frame-level features is then used for comparison with state-of-the-art models from language identification task that are built for dialect identification (DID) that use utterance-level embeddings, namely x-vector and u-vector. The x-vector and u-vector based models are trained on segment-level features. This work explores segment-level features namely phone-state bottleneck features (BNFs) and wav2vec features extracted from pretrained feature extractors. The x-vector based model uses time delay neural network (TDNN) for the extraction of an utterance-level embedding from sequence of speech segments. A u-vector based model uses bidirectional LSTM (BLSTM) to extract utterance-level embeddings from sequence of speech segments. This work also proposes a novel transformer-based model to extract utterance-level embedding from sequence of speech segments. Results show the effectiveness of the proposed methods for DID of Konkani. It is observed that proposed transformer-based model outperform the other explored models. The results also show the superiority of wav2vec features over the phone-state BNFs for DID task.

Original languageEnglish
Title of host publicationSpeech and Computer - 25th International Conference, SPECOM 2023, Proceedings
EditorsAlexey Karpov, K. Samudravijaya, K. T. Deepak, Rajesh M. Hegde, S. R. Mahadeva Prasanna, Shyam S. Agrawal
PublisherSpringer Science and Business Media Deutschland GmbH
Pages461-474
Number of pages14
ISBN (Print)9783031483110
DOIs
Publication statusPublished - 2023
Event25th International Conference on Speech and Computer, SPECOM 2023 - Dharwad, India
Duration: 29-11-202302-12-2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14339 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Speech and Computer, SPECOM 2023
Country/TerritoryIndia
CityDharwad
Period29-11-2302-12-23

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Exploring the Impact of Different Approaches for Spoken Dialect Identification of Konkani Language'. Together they form a unique fingerprint.

Cite this