TY - GEN
T1 - Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification
AU - Prasad, Abhay
AU - Periyasamy, Vijitha
AU - Ghosh, Prasanta Kumar
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9% and 9.4% (absolute) respectively.
AB - Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9% and 9.4% (absolute) respectively.
UR - https://www.scopus.com/pages/publications/84946010858
UR - https://www.scopus.com/pages/publications/84946010858#tab=citedBy
U2 - 10.1109/ICASSP.2015.7178775
DO - 10.1109/ICASSP.2015.7178775
M3 - Conference contribution
AN - SCOPUS:84946010858
VL - 2015-August
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4265
EP - 4269
BT - 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
Y2 - 19 April 2014 through 24 April 2014
ER -