Abstract
This paper presents approaches to develop spoken Language Identification (LID) system for identifying the three low-resource Indian languages - Kannada, Konkani, and Tulu, which are commonly spoken in the coastal region of Karnataka state of India. To address the challenges arising due to low-resource conditions, the proposed work aims to use a combination of data augmentation, transfer-learning and Multi-view learning. Specifically, noise perturbation and speed perturbation are used for data augmentation, and pre-trained Wav2Vec 2.0 and Whisper models are used for feature extraction, using which different Deep Learning (DL) based end-to-end models are trained for LID. Following this, a Multi-view learning based strategy is incorporated under which the LID model processes the feature representations obtained from Wav2Vec 2.0 and Whisper models simultaneously, using two separate input arms to capture the complimentary contents in them, leading to improved performance. Additionally, a combination of traditional Machine Learning (ML) with DL models is explored, in which, utterance-level embeddings obtained using pre-trained LID models are classified using separate back-end classifiers such as K-Nearest Neighbor (KNN) and Support Vector Machine (SVM). The results obtained highlight the advantage of using transfer-learning, Multi-view learning, and combination of DL-based model with ML-based classifiers to improve the overall performance of the LID system, amid low-resource settings. Specifically, combination of SVM backend on x-vector model with multi-view provided the best result compared to other models.
| Original language | English |
|---|---|
| Article number | 015203 |
| Journal | Engineering Research Express |
| Volume | 8 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 01-01-2026 |
All Science Journal Classification (ASJC) codes
- General Engineering
Fingerprint
Dive into the research topics of 'Spoken language identification system for detecting coastal karnataka languages using transfer learning and multi-view learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver