TY - GEN
T1 - Implementation of Neural Network Regression Model for Faster Redshift Analysis on Cloud-Based Spark Platform
AU - Sen, Snigdha
AU - Saha, Snehanshu
AU - Chakraborty, Pavan
AU - Singh, Krishna Pratap
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Since observational astronomy has turned into data-driven astronomy recently, analyzing this huge data effectively to extract useful information is becoming an important and essential task day by day. In this paper, we developed a neural network model to analyze redshift data of million of extragalactic objects. In order to do that, two different approaches for faster training of neural networks have been proposed. The first approach deals with the training model using Lipschitz-based adaptive learning rate in a single node/machine whereas the second approach discusses processing astronomy data in a multinode clustered environment. This approach can scale up to accommodate multiple nodes when necessary to handle bulk data using Apache spark and Elephas. Additionally, this paper also addresses the scalability and storage issue by implementing the model on the cloud. We used the distributed processing capability of the spark that reads data directly from HDFS (Hadoop Distributed File System) of multiple machines and our experimental results show that using these approaches we can reduce training time and CPU time tremendously which is a crucial requirement while dealing with the extensive dataset. Although we have tested our experiment on a subset of huge data it can be scaled to process data of any size as well without much hurdle.
AB - Since observational astronomy has turned into data-driven astronomy recently, analyzing this huge data effectively to extract useful information is becoming an important and essential task day by day. In this paper, we developed a neural network model to analyze redshift data of million of extragalactic objects. In order to do that, two different approaches for faster training of neural networks have been proposed. The first approach deals with the training model using Lipschitz-based adaptive learning rate in a single node/machine whereas the second approach discusses processing astronomy data in a multinode clustered environment. This approach can scale up to accommodate multiple nodes when necessary to handle bulk data using Apache spark and Elephas. Additionally, this paper also addresses the scalability and storage issue by implementing the model on the cloud. We used the distributed processing capability of the spark that reads data directly from HDFS (Hadoop Distributed File System) of multiple machines and our experimental results show that using these approaches we can reduce training time and CPU time tremendously which is a crucial requirement while dealing with the extensive dataset. Although we have tested our experiment on a subset of huge data it can be scaled to process data of any size as well without much hurdle.
UR - https://www.scopus.com/pages/publications/85112675786
UR - https://www.scopus.com/pages/publications/85112675786#tab=citedBy
U2 - 10.1007/978-3-030-79463-7_50
DO - 10.1007/978-3-030-79463-7_50
M3 - Conference contribution
AN - SCOPUS:85112675786
SN - 9783030794620
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 591
EP - 602
BT - Advances and Trends in Artificial Intelligence. From Theory to Practice - 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Proceedings
A2 - Fujita, Hamido
A2 - Selamat, Ali
A2 - Lin, Jerry Chun-Wei
A2 - Ali, Moonis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021
Y2 - 26 July 2021 through 29 July 2021
ER -