TY - GEN
T1 - Exploring the Impact of Different Approaches for Spoken Dialect Identification of Konkani Language
AU - Monteiro, Sean
AU - Angra, Ananya
AU - H, Muralikrishna
AU - Thenkanidiyoor, Veena
AU - Dileep, A. D.
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - This work aims to identify dialects for Konkani language. In this work, various state-of-the-art methods in language identification are explored for the identification of dialects of the Konkani language. The initial base model is constructed using fully connected neural network which is trained on frame-level Mel-frequency cepstral coefficient (MFCC) features. This base model trained on frame-level features is then used for comparison with state-of-the-art models from language identification task that are built for dialect identification (DID) that use utterance-level embeddings, namely x-vector and u-vector. The x-vector and u-vector based models are trained on segment-level features. This work explores segment-level features namely phone-state bottleneck features (BNFs) and wav2vec features extracted from pretrained feature extractors. The x-vector based model uses time delay neural network (TDNN) for the extraction of an utterance-level embedding from sequence of speech segments. A u-vector based model uses bidirectional LSTM (BLSTM) to extract utterance-level embeddings from sequence of speech segments. This work also proposes a novel transformer-based model to extract utterance-level embedding from sequence of speech segments. Results show the effectiveness of the proposed methods for DID of Konkani. It is observed that proposed transformer-based model outperform the other explored models. The results also show the superiority of wav2vec features over the phone-state BNFs for DID task.
AB - This work aims to identify dialects for Konkani language. In this work, various state-of-the-art methods in language identification are explored for the identification of dialects of the Konkani language. The initial base model is constructed using fully connected neural network which is trained on frame-level Mel-frequency cepstral coefficient (MFCC) features. This base model trained on frame-level features is then used for comparison with state-of-the-art models from language identification task that are built for dialect identification (DID) that use utterance-level embeddings, namely x-vector and u-vector. The x-vector and u-vector based models are trained on segment-level features. This work explores segment-level features namely phone-state bottleneck features (BNFs) and wav2vec features extracted from pretrained feature extractors. The x-vector based model uses time delay neural network (TDNN) for the extraction of an utterance-level embedding from sequence of speech segments. A u-vector based model uses bidirectional LSTM (BLSTM) to extract utterance-level embeddings from sequence of speech segments. This work also proposes a novel transformer-based model to extract utterance-level embedding from sequence of speech segments. Results show the effectiveness of the proposed methods for DID of Konkani. It is observed that proposed transformer-based model outperform the other explored models. The results also show the superiority of wav2vec features over the phone-state BNFs for DID task.
UR - https://www.scopus.com/pages/publications/85178562692
UR - https://www.scopus.com/inward/citedby.url?scp=85178562692&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-48312-7_37
DO - 10.1007/978-3-031-48312-7_37
M3 - Conference contribution
AN - SCOPUS:85178562692
SN - 9783031483110
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 461
EP - 474
BT - Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings
A2 - Karpov, Alexey
A2 - Samudravijaya, K.
A2 - Deepak, K. T.
A2 - Hegde, Rajesh M.
A2 - Prasanna, S. R. Mahadeva
A2 - Agrawal, Shyam S.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Speech and Computer, SPECOM 2023
Y2 - 29 November 2023 through 2 December 2023
ER -