TY - GEN
T1 - KNN-based Speech-to-Text Conversion for Bangla to Enhance Regional Language Processing
AU - Nayak, Aditi
AU - Mukherjee, Jasmita
AU - Parashar, Deepak
AU - Bahadure, Nilesh
AU - Dikshit, Kshem
AU - Joshi, Rahul
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In order to close the digital divide and promote greater linguistic diversity in natural language processing applications, speech to text conversion for regional languages is a crucial first step. This study introduces a novel method for employing the K-Nearest Neighbors (KNN) algorithm to translate spoken Bangla, a regional language that is mostly spoken in Bangladesh and West Bengal, India, into text. By mapping audio data to linguistic representations, the suggested method uses the KNN's pattern recognition ability to convert Bangla voice into text. To extract the discriminative aspects of spoken language, we use the Mel-frequency cepstral coefficients (MFCCs), which are crucial audio features. To improve the model's resilience, our dataset includes a wide range of Bangla speech samples, including regional dialects and accents. This work contributes to the broader goal of democratizing speech technology for regional languages, empowering non-English-speaking communities, and making information accessible to a wider audience. The proposed KNN-based system opens the door to diverse applications of regional language interfaces.
AB - In order to close the digital divide and promote greater linguistic diversity in natural language processing applications, speech to text conversion for regional languages is a crucial first step. This study introduces a novel method for employing the K-Nearest Neighbors (KNN) algorithm to translate spoken Bangla, a regional language that is mostly spoken in Bangladesh and West Bengal, India, into text. By mapping audio data to linguistic representations, the suggested method uses the KNN's pattern recognition ability to convert Bangla voice into text. To extract the discriminative aspects of spoken language, we use the Mel-frequency cepstral coefficients (MFCCs), which are crucial audio features. To improve the model's resilience, our dataset includes a wide range of Bangla speech samples, including regional dialects and accents. This work contributes to the broader goal of democratizing speech technology for regional languages, empowering non-English-speaking communities, and making information accessible to a wider audience. The proposed KNN-based system opens the door to diverse applications of regional language interfaces.
UR - https://www.scopus.com/pages/publications/105012824662
UR - https://www.scopus.com/pages/publications/105012824662#tab=citedBy
U2 - 10.1109/ICSSAS66150.2025.11080720
DO - 10.1109/ICSSAS66150.2025.11080720
M3 - Conference contribution
AN - SCOPUS:105012824662
T3 - Proceedings - 3rd International Conference on Self Sustainable Artificial Intelligence Systems, ICSSAS 2025
SP - 1716
EP - 1720
BT - Proceedings - 3rd International Conference on Self Sustainable Artificial Intelligence Systems, ICSSAS 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Self Sustainable Artificial Intelligence Systems, ICSSAS 2025
Y2 - 11 June 2025 through 13 June 2025
ER -