TY - GEN
T1 - Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure
AU - Murthy, Y. V.Srinivasa
AU - Koolagudi, Shashidhar G.
AU - Swaroop, Vishnu G.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - The process of classifying vocal and non-vocal regions in an audio clip is the base for many Music Information Retrieval (MIR) tasks. In this work, we have computed novel features based on formant structure for segmenting the vocal and non-vocal regions of a given music clip. The features such as obtuse angles at formant peak, valley locations, convexity, and concavity have been proposed for this task after thorough analysis. The obtuse angles have been computed for second, third and fourth formants as much discrimination is not found for the first formant. The computed formant related features have been added to the base-line Mel frequency cepstral coefficients (MFCCs) in order to improve the performance. Moreover, singer formant (F5) has also been computed forming a 19-dimensional feature vector. As artificial neural networks (ANNs) are more suitable for handling nonlinear data, they have been considered as a classifier. Further, the 11-point moving window has been applied to avoid intermittent misclassifications. An accuracy of 88% is obtained using the proposed approach with a 19-dimensional feature vector.
AB - The process of classifying vocal and non-vocal regions in an audio clip is the base for many Music Information Retrieval (MIR) tasks. In this work, we have computed novel features based on formant structure for segmenting the vocal and non-vocal regions of a given music clip. The features such as obtuse angles at formant peak, valley locations, convexity, and concavity have been proposed for this task after thorough analysis. The obtuse angles have been computed for second, third and fourth formants as much discrimination is not found for the first formant. The computed formant related features have been added to the base-line Mel frequency cepstral coefficients (MFCCs) in order to improve the performance. Moreover, singer formant (F5) has also been computed forming a 19-dimensional feature vector. As artificial neural networks (ANNs) are more suitable for handling nonlinear data, they have been considered as a classifier. Further, the 11-point moving window has been applied to avoid intermittent misclassifications. An accuracy of 88% is obtained using the proposed approach with a 19-dimensional feature vector.
UR - https://www.scopus.com/pages/publications/85061501817
UR - https://www.scopus.com/pages/publications/85061501817#tab=citedBy
U2 - 10.1109/ICAPR.2017.8593164
DO - 10.1109/ICAPR.2017.8593164
M3 - Conference contribution
AN - SCOPUS:85061501817
T3 - 2017 9th International Conference on Advances in Pattern Recognition, ICAPR 2017
SP - 304
EP - 309
BT - 2017 9th International Conference on Advances in Pattern Recognition, ICAPR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Advances in Pattern Recognition, ICAPR 2017
Y2 - 27 December 2017 through 30 December 2017
ER -