A Novel Classification-Based Approach for Quicker Prediction of Redshift Using Apache Spark

  • Pankaj*
  • , Snigdha Sen
  • , Pavan Chakraborty
  • *Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    Over the years observational astronomy is generating a huge amount of data due to the usage of high-resolution telescopic cameras. Extracting useful information from this huge data is an essential task in present days. Using this information, it will be easy to understand the universe better. In this paper, we have implemented four classification-based models such as Naive Bayes, Support Vector Machine, Decision Trees, and Random Forests using Apache Spark and done a detailed comparative analysis of those models in the redshift prediction task. Redshift is a metric used widely for distance measurement. Pyspark has been used for this task to process big data easily and quickly. Although redshift estimation is a regression task, we have proposed a new classification-based approach where redshift values have been divided into multiple bins and each bin is treated as a class. This approach helped us to achieve reasonably good accuracy while predicting redshift in the pyspark environment. Out of four models, random forest outperforms others in predicting redshift with the best accuracy of 95.05%.

    Original languageEnglish
    Title of host publication2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    ISBN (Electronic)9798350333848
    DOIs
    Publication statusPublished - 2022
    Event2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022 - Chennai, India
    Duration: 08-12-202210-12-2022

    Publication series

    Name2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022

    Conference

    Conference2022 International Conference on Data Science, Agents and Artificial Intelligence, ICDSAAI 2022
    Country/TerritoryIndia
    CityChennai
    Period08-12-2210-12-22

    All Science Journal Classification (ASJC) codes

    • Artificial Intelligence
    • Computer Science Applications
    • Computer Vision and Pattern Recognition
    • Hardware and Architecture
    • Information Systems

    Fingerprint

    Dive into the research topics of 'A Novel Classification-Based Approach for Quicker Prediction of Redshift Using Apache Spark'. Together they form a unique fingerprint.

    Cite this