TY - GEN
T1 - Advancements in Human Activity Recognition
AU - Jayanth, Prathipati
AU - Arya, Varun
AU - Naik, Dinesh
AU - Rashmi, M.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Human Action Recognition (HAR) has advanced significantly with the integration of deep learning and machine learning techniques, enabling more accurate and efficient recognition systems. This paper explores three approaches using different data modalities for action recognition on two widely used datasets, UCF50 and UTD-MHAD. Two distinct hybrid CNN-LSTM models were implemented for the UCF50. In these models, Convolutional Neural Networks (CNNs) are utilized to extract spatial features from RGB video frames, while Long Short-Term Memory (LSTM) is employed to simulate the temporal dependencies of human actions. Machine learning models are proposed in conjunction with the extraction of a collection of novel spatio-temporal features from skeleton data using the UTD-MHAD dataset. These models incorporate algorithms such as Support Vector Machines (SVM) and Random Forest, as well as Artificial Neural Networks (ANN). Additionally, this work proposes unique image representation of skeleton data and CNNs are employed to classify skeleton spatio-temporal images generated from UTD-MHAD providing enhanced recognition. The models exhibit a high level of accuracy, with 94% for a subset of actions from UCF50 and 80% for UTD-MHAD. This demonstrates the robustness of our approach in managing variability across various individuals.
AB - Human Action Recognition (HAR) has advanced significantly with the integration of deep learning and machine learning techniques, enabling more accurate and efficient recognition systems. This paper explores three approaches using different data modalities for action recognition on two widely used datasets, UCF50 and UTD-MHAD. Two distinct hybrid CNN-LSTM models were implemented for the UCF50. In these models, Convolutional Neural Networks (CNNs) are utilized to extract spatial features from RGB video frames, while Long Short-Term Memory (LSTM) is employed to simulate the temporal dependencies of human actions. Machine learning models are proposed in conjunction with the extraction of a collection of novel spatio-temporal features from skeleton data using the UTD-MHAD dataset. These models incorporate algorithms such as Support Vector Machines (SVM) and Random Forest, as well as Artificial Neural Networks (ANN). Additionally, this work proposes unique image representation of skeleton data and CNNs are employed to classify skeleton spatio-temporal images generated from UTD-MHAD providing enhanced recognition. The models exhibit a high level of accuracy, with 94% for a subset of actions from UCF50 and 80% for UTD-MHAD. This demonstrates the robustness of our approach in managing variability across various individuals.
UR - https://www.scopus.com/pages/publications/105029616778
UR - https://www.scopus.com/pages/publications/105029616778#tab=citedBy
U2 - 10.1007/978-3-032-14534-5_31
DO - 10.1007/978-3-032-14534-5_31
M3 - Conference contribution
AN - SCOPUS:105029616778
SN - 9783032145338
T3 - Communications in Computer and Information Science
SP - 377
EP - 388
BT - Machine Learning, Image Processing, Network Security and Data Sciences - 6th International Conference, MIND 2024, Revised Selected Papers
A2 - Modi, Chirag
A2 - Thenkanidiyoor, Veena
A2 - Verma, Gyanendra Kumar
A2 - Brankovic, Ljiljana
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th International Conference on Machine Learning, Image Processing, Network Security, and Data Sciences, MIND 2024
Y2 - 20 December 2024 through 21 December 2024
ER -