Skip to main navigation Skip to search Skip to main content

Implementation of Cascade Learning using Apache Spark

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    With the exponential growth of data, many technologies have also been developed to cope with the need to process such big dataset and generate meaningful information out of those dataset. To deal with such problems several frameworks were developed and Apache Hadoop and Apache Spark are one of the best in that category, which proved to be very useful in dealing with such large datasets. In this paper we are dealing with two approaches, both demonstrating cascade learning which is one of the best ways to improve accuracy of a machine learning model such as ours. The dataset considered here is from cosmology redshift data. In the first approach we are using Elephas that helps us in developing an end to end deep learning pipeline in the Apache Spark environment. Once the model gives out the result, we are implementing Cascading over it. And for the second approach we transformed the Redshift attribute into eight different classes ranging from 0 to 7. Later we created a framework using Apache Spark and imposed Cascade Learning over it and ran our model based on modified dataset. The goal in both the approaches is to improve the accuracy with the help of cascading. Among five classifiers we experimented including Decision Tree, Random Forest, Naive Bayes, Logistic Regression and Multilayer Perceptron, the best result came from the Decision Tree where training accuracy improved by 0.98%, test accuracy improved by 1.24% and test precision improved by 0.31% after cascading.

    Original languageEnglish
    Title of host publication2022 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2022
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    ISBN (Electronic)9781665497817
    DOIs
    Publication statusPublished - 2022
    Event2022 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2022 - Bangalore, India
    Duration: 08-07-202210-07-2022

    Publication series

    Name2022 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2022

    Conference

    Conference2022 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2022
    Country/TerritoryIndia
    CityBangalore
    Period08-07-2210-07-22

    All Science Journal Classification (ASJC) codes

    • Control and Optimization
    • Artificial Intelligence
    • Computer Networks and Communications
    • Computer Science Applications
    • Signal Processing
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Implementation of Cascade Learning using Apache Spark'. Together they form a unique fingerprint.

    Cite this