An Approach Toward Design and Implementation of Distributed Framework for Astronomical Big Data Processing

  • R. Monisha
  • , Snigdha Sen*
  • , Rajat U. Davangeri
  • , K. S. Sri Lakshmi
  • , Sourav Dey
  • *Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    9 Citations (Scopus)

    Abstract

    Due to advancement of modern technology, data generation is becoming huge in all sectors in recent times. The observational astronomy has embraced modern tools, thereby generating large data. Analyzing and extracting useful pattern from those data is the need of the hour. In this paper, we have tried to implement several machine learning algorithms using Apache Spark to process this massive amount of data. The case study from cosmology we considered here is photometric redshift estimation which is a dominant research area in astronomy. Due to high end telescopic camera, lot of astronomical data is being generated which need to be analyzed efficiently and quickly. In this work, we have implemented Artificial Neural network (ANN), Random Forest, Linear Regression, and Decision Tree algorithm on Apache Spark to predict redshift of galaxies and quasars. The focus area of our study is to explore and compare execution time of those four machine learning algorithms and provide a detailed study of their performance in distributed environment as well as standalone system. The dataset used here are collected from Sloan digital Sky survey (SDSS) which is a wide range in depth sky survey. Our work shows that Random Forest outperforms other algorithms in terms of predictive performance in both the environments. Although we experimented on subset of data, scalability issue also can be treated using big data framework.

    Original languageEnglish
    Title of host publicationIntelligent Systems - Proceedings of ICMIB 2021
    EditorsSiba K. Udgata, Srinivas Sethi, Xiao-Zhi Gao
    PublisherSpringer Science and Business Media Deutschland GmbH
    Pages267-275
    Number of pages9
    ISBN (Print)9789811909009
    DOIs
    Publication statusPublished - 2022
    Event2nd International Conference on Machine Learning, Internet of Things and Big Data, ICMIB 2021 - Sarang, India
    Duration: 18-12-202120-12-2021

    Publication series

    NameLecture Notes in Networks and Systems
    Volume431
    ISSN (Print)2367-3370
    ISSN (Electronic)2367-3389

    Conference

    Conference2nd International Conference on Machine Learning, Internet of Things and Big Data, ICMIB 2021
    Country/TerritoryIndia
    CitySarang
    Period18-12-2120-12-21

    All Science Journal Classification (ASJC) codes

    • Control and Systems Engineering
    • Signal Processing
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'An Approach Toward Design and Implementation of Distributed Framework for Astronomical Big Data Processing'. Together they form a unique fingerprint.

    Cite this