TY - GEN
T1 - A Novel Classification-Based Approach for Quicker Prediction of Redshift Using Apache Spark
AU - Pankaj,
AU - Sen, Snigdha
AU - Chakraborty, Pavan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Over the years observational astronomy is generating a huge amount of data due to the usage of high-resolution telescopic cameras. Extracting useful information from this huge data is an essential task in present days. Using this information, it will be easy to understand the universe better. In this paper, we have implemented four classification-based models such as Naive Bayes, Support Vector Machine, Decision Trees, and Random Forests using Apache Spark and done a detailed comparative analysis of those models in the redshift prediction task. Redshift is a metric used widely for distance measurement. Pyspark has been used for this task to process big data easily and quickly. Although redshift estimation is a regression task, we have proposed a new classification-based approach where redshift values have been divided into multiple bins and each bin is treated as a class. This approach helped us to achieve reasonably good accuracy while predicting redshift in the pyspark environment. Out of four models, random forest outperforms others in predicting redshift with the best accuracy of 95.05%.
AB - Over the years observational astronomy is generating a huge amount of data due to the usage of high-resolution telescopic cameras. Extracting useful information from this huge data is an essential task in present days. Using this information, it will be easy to understand the universe better. In this paper, we have implemented four classification-based models such as Naive Bayes, Support Vector Machine, Decision Trees, and Random Forests using Apache Spark and done a detailed comparative analysis of those models in the redshift prediction task. Redshift is a metric used widely for distance measurement. Pyspark has been used for this task to process big data easily and quickly. Although redshift estimation is a regression task, we have proposed a new classification-based approach where redshift values have been divided into multiple bins and each bin is treated as a class. This approach helped us to achieve reasonably good accuracy while predicting redshift in the pyspark environment. Out of four models, random forest outperforms others in predicting redshift with the best accuracy of 95.05%.
UR - https://www.scopus.com/pages/publications/85148487786
UR - https://www.scopus.com/pages/publications/85148487786#tab=citedBy
U2 - 10.1109/ICDSAAI55433.2022.10028971
DO - 10.1109/ICDSAAI55433.2022.10028971
M3 - Conference contribution
AN - SCOPUS:85148487786
T3 - 2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022
BT - 2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022
Y2 - 8 December 2022 through 10 December 2022
ER -