TY - JOUR
T1 - Erratic Navigation in Lecture Videos using Hybrid Text based Index Point Generation
AU - Hukkeri, Geeta S.
AU - Goudar, R. H.
N1 - Publisher Copyright:
© 2022, International Journal of Advanced Computer Science and Applications. All rights reserved.
PY - 2022
Y1 - 2022
N2 - The difficulty in lecture videos is an erratic navigation in lecture video for watching only the needed portion of video content. Machine learning technologies like Optical Character Recognition and Automatic Speech Recognition allows to easily fetch the information that is hybrid text from lecture slides and audio respectively. This paper presents three main analysis for hybrid text retrieval, which is further useful for indexing the video. The experimental results indicate that the key frame extraction accuracy is 94 percent. The accuracy of the Slide-To-Text conversion achieved by this study's evaluation of the text extraction capability of Tesseract, Abbyy Finereader, Transym, and the Google Cloud Vision Optical Character Recognition is 92.0%, 90.5%, 80.8%, and 96.7% respectively. Similarly the result of title identification is about 96 percent. To extract the speech text three different APIs are used namely, Microsoft, IBM, and Google Speech-to-Text API. The performance of the transcript generator is measured using Word Error Rate, Word Recognition Rate, and Sentence Error Rate metrics. This paper found that Google Cloud Vision Optical Character Recognition and Google Speech-to-Text API have achieved best results compared to other methods. The results obtained are very good and agreeable, therefore the proposed methods can be used for automating the lecture video indexing.
AB - The difficulty in lecture videos is an erratic navigation in lecture video for watching only the needed portion of video content. Machine learning technologies like Optical Character Recognition and Automatic Speech Recognition allows to easily fetch the information that is hybrid text from lecture slides and audio respectively. This paper presents three main analysis for hybrid text retrieval, which is further useful for indexing the video. The experimental results indicate that the key frame extraction accuracy is 94 percent. The accuracy of the Slide-To-Text conversion achieved by this study's evaluation of the text extraction capability of Tesseract, Abbyy Finereader, Transym, and the Google Cloud Vision Optical Character Recognition is 92.0%, 90.5%, 80.8%, and 96.7% respectively. Similarly the result of title identification is about 96 percent. To extract the speech text three different APIs are used namely, Microsoft, IBM, and Google Speech-to-Text API. The performance of the transcript generator is measured using Word Error Rate, Word Recognition Rate, and Sentence Error Rate metrics. This paper found that Google Cloud Vision Optical Character Recognition and Google Speech-to-Text API have achieved best results compared to other methods. The results obtained are very good and agreeable, therefore the proposed methods can be used for automating the lecture video indexing.
UR - https://www.scopus.com/pages/publications/85137157673
UR - https://www.scopus.com/pages/publications/85137157673#tab=citedBy
U2 - 10.14569/IJACSA.2022.0130813
DO - 10.14569/IJACSA.2022.0130813
M3 - Article
AN - SCOPUS:85137157673
SN - 2158-107X
VL - 13
SP - 99
EP - 107
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 8
ER -