TY - GEN
T1 - Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram
AU - Ramteke, Pravin Bhaskar
AU - Supanekar, Sujata
AU - Aithal, Venkataraja
AU - Koolagudi, Shashidhar G.
N1 - Funding Information:
The authors would like to thank the Cognitive Science Research Initiative (CSRI), Department of Science & Technology, Government of India, Grant no. SR/CSRI/ 49/2015, for its financial support on this work.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - In this paper, an attempt has been made to identify palatal fricative fronting in children speech, where postalveolar /sh/ is mispronounced as dental /s/. In children’s speech, the concentration of energy (darkest part) of spectrogram for /s/ ranges 4000 Hz to 8000 Hz, whereas it ranges 3000 Hz 8000 Hz for /sh/. Gammatonegram follows the frequency subbands of the ear (wider for higher frequencies). Various spectral properties such as spectral centroid, spectral crest factor, spectral decrease, spectral flatness, spectral flux, spectral kurtosis, spectral spread, spectral skewness, spectral slope and Shannon entropy of the spectrogram (interval of 2000 Hz), extracted from the Gammatonegram are proposed for the characterization of /sh/ and /s/. The dataset recorded from 60 native Kannada speaking children of age between 3 1/2 to 6 1/2 years is considered for the analysis from NITK Kids’ Speech Corpus. Support vector machine (SVMs) is considered for the classification. Various combinations of the proposed features are considered for the evaluation, along with the MFCCs(39) and LPCCs(39). Combination of MFCCs(39), LPCCs(39) and Entropy(4) is observed to achieve highest mispronunciation identification performance of 83.2983%.
AB - In this paper, an attempt has been made to identify palatal fricative fronting in children speech, where postalveolar /sh/ is mispronounced as dental /s/. In children’s speech, the concentration of energy (darkest part) of spectrogram for /s/ ranges 4000 Hz to 8000 Hz, whereas it ranges 3000 Hz 8000 Hz for /sh/. Gammatonegram follows the frequency subbands of the ear (wider for higher frequencies). Various spectral properties such as spectral centroid, spectral crest factor, spectral decrease, spectral flatness, spectral flux, spectral kurtosis, spectral spread, spectral skewness, spectral slope and Shannon entropy of the spectrogram (interval of 2000 Hz), extracted from the Gammatonegram are proposed for the characterization of /sh/ and /s/. The dataset recorded from 60 native Kannada speaking children of age between 3 1/2 to 6 1/2 years is considered for the analysis from NITK Kids’ Speech Corpus. Support vector machine (SVMs) is considered for the classification. Various combinations of the proposed features are considered for the evaluation, along with the MFCCs(39) and LPCCs(39). Combination of MFCCs(39), LPCCs(39) and Entropy(4) is observed to achieve highest mispronunciation identification performance of 83.2983%.
UR - http://www.scopus.com/inward/record.url?scp=85098285324&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098285324&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-66187-8_22
DO - 10.1007/978-3-030-66187-8_22
M3 - Conference contribution
AN - SCOPUS:85098285324
SN - 9783030661861
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 234
EP - 243
BT - Mining Intelligence and Knowledge Exploration - 7th International Conference, MIKE 2019, Proceedings
A2 - B.R., P.
A2 - Thenkanidiyoor, Veena
A2 - Prasath, Rajendra
A2 - Vanga, Odelu
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2019
Y2 - 19 December 2019 through 22 December 2019
ER -