TY - GEN
T1 - An Approach Toward Design and Implementation of Distributed Framework for Astronomical Big Data Processing
AU - Monisha, R.
AU - Sen, Snigdha
AU - Davangeri, Rajat U.
AU - Sri Lakshmi, K. S.
AU - Dey, Sourav
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2022
Y1 - 2022
N2 - Due to advancement of modern technology, data generation is becoming huge in all sectors in recent times. The observational astronomy has embraced modern tools, thereby generating large data. Analyzing and extracting useful pattern from those data is the need of the hour. In this paper, we have tried to implement several machine learning algorithms using Apache Spark to process this massive amount of data. The case study from cosmology we considered here is photometric redshift estimation which is a dominant research area in astronomy. Due to high end telescopic camera, lot of astronomical data is being generated which need to be analyzed efficiently and quickly. In this work, we have implemented Artificial Neural network (ANN), Random Forest, Linear Regression, and Decision Tree algorithm on Apache Spark to predict redshift of galaxies and quasars. The focus area of our study is to explore and compare execution time of those four machine learning algorithms and provide a detailed study of their performance in distributed environment as well as standalone system. The dataset used here are collected from Sloan digital Sky survey (SDSS) which is a wide range in depth sky survey. Our work shows that Random Forest outperforms other algorithms in terms of predictive performance in both the environments. Although we experimented on subset of data, scalability issue also can be treated using big data framework.
AB - Due to advancement of modern technology, data generation is becoming huge in all sectors in recent times. The observational astronomy has embraced modern tools, thereby generating large data. Analyzing and extracting useful pattern from those data is the need of the hour. In this paper, we have tried to implement several machine learning algorithms using Apache Spark to process this massive amount of data. The case study from cosmology we considered here is photometric redshift estimation which is a dominant research area in astronomy. Due to high end telescopic camera, lot of astronomical data is being generated which need to be analyzed efficiently and quickly. In this work, we have implemented Artificial Neural network (ANN), Random Forest, Linear Regression, and Decision Tree algorithm on Apache Spark to predict redshift of galaxies and quasars. The focus area of our study is to explore and compare execution time of those four machine learning algorithms and provide a detailed study of their performance in distributed environment as well as standalone system. The dataset used here are collected from Sloan digital Sky survey (SDSS) which is a wide range in depth sky survey. Our work shows that Random Forest outperforms other algorithms in terms of predictive performance in both the environments. Although we experimented on subset of data, scalability issue also can be treated using big data framework.
UR - https://www.scopus.com/pages/publications/85130328284
UR - https://www.scopus.com/pages/publications/85130328284#tab=citedBy
U2 - 10.1007/978-981-19-0901-6_26
DO - 10.1007/978-981-19-0901-6_26
M3 - Conference contribution
AN - SCOPUS:85130328284
SN - 9789811909009
T3 - Lecture Notes in Networks and Systems
SP - 267
EP - 275
BT - Intelligent Systems - Proceedings of ICMIB 2021
A2 - Udgata, Siba K.
A2 - Sethi, Srinivas
A2 - Gao, Xiao-Zhi
PB - Springer Science and Business Media Deutschland GmbH
T2 - 2nd International Conference on Machine Learning, Internet of Things and Big Data, ICMIB 2021
Y2 - 18 December 2021 through 20 December 2021
ER -