TY - JOUR
T1 - Automatic glottis detection and segmentation in stroboscopic videos using convolutional networks
AU - Degala, Divya
AU - Achuth Rao, M. V.
AU - Krishnamurthy, Rahul
AU - Gopikishore, Pebbili
AU - Priyadharshini, Veeramani
AU - Prakash, T. K.
AU - Ghosh, Prasanta Kumar
N1 - Funding Information:
Authors thank the Department of Science and Technology (DST), Government of India for their support in this work.
Publisher Copyright:
Copyright © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - Laryngeal videostroboscopy is widely used for the analysis of glottal vibration patterns. This analysis plays a crucial role in the diagnosis of voice disorders. It is essential to study these patterns using automatic glottis segmentation methods to avoid subjectiveness in diagnosis. Glottis detection is an essential step before glottis segmentation. This paper considers the problem of automatic glottis segmentation using U-Net based deep convolutional networks. For accurate glottis detection, we train a fully convolutional network with a large amount of glottal and non-glottal images. In glottis segmentation, we consider U-Net with three different weight initialization schemes: 1) Random weight Initialization (RI), 2) Detection Network weight Initialization (DNI) and 3) Detection Network encoder frozen weight Initialization (DNIFr), using two different architectures: 1) U-Net without skip connection (UWSC) 2) U-Net with skip connection (USC). Experiments with 22 subjects' data reveal that the performance of glottis segmentation network can be increased by initializing its weights using those of the glottis detection network. Among all schemes, when DNI is used, the USC yields an average localization accuracy of 81.3% and a Dice score of 0.73, which are better than those from the baseline approach by 15.87% and 0.07 (absolute), respectively.
AB - Laryngeal videostroboscopy is widely used for the analysis of glottal vibration patterns. This analysis plays a crucial role in the diagnosis of voice disorders. It is essential to study these patterns using automatic glottis segmentation methods to avoid subjectiveness in diagnosis. Glottis detection is an essential step before glottis segmentation. This paper considers the problem of automatic glottis segmentation using U-Net based deep convolutional networks. For accurate glottis detection, we train a fully convolutional network with a large amount of glottal and non-glottal images. In glottis segmentation, we consider U-Net with three different weight initialization schemes: 1) Random weight Initialization (RI), 2) Detection Network weight Initialization (DNI) and 3) Detection Network encoder frozen weight Initialization (DNIFr), using two different architectures: 1) U-Net without skip connection (UWSC) 2) U-Net with skip connection (USC). Experiments with 22 subjects' data reveal that the performance of glottis segmentation network can be increased by initializing its weights using those of the glottis detection network. Among all schemes, when DNI is used, the USC yields an average localization accuracy of 81.3% and a Dice score of 0.73, which are better than those from the baseline approach by 15.87% and 0.07 (absolute), respectively.
UR - http://www.scopus.com/inward/record.url?scp=85098209655&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098209655&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2599
DO - 10.21437/Interspeech.2020-2599
M3 - Conference article
AN - SCOPUS:85098209655
SN - 2308-457X
VL - 2020-October
SP - 4801
EP - 4805
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -