Preprocessing of Datasets Using Sequential and Parallel Approach: A Comparison

Shwetha Rai*, M. Geetha, Preetham Kumar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data preprocessing is a technique in data mining to make the data read for further processing according to the requirement. Preprocessing is required because the data might be incomplete, redundant, come from different sources which may require aggregation, etc., and data can be processed either sequentially or in parallel. There are several parallel frameworks such as Hadoop, MPI, and CUDA to process the data. A survey has been done to understand these parallel frameworks, and a comparison between sequential and parallel approach is carried out to compare the efficiency of the two approaches.

Original languageEnglish
Title of host publicationExpert Clouds and Applications - Proceedings of ICOECA 2021
EditorsI. Jeena Jacob, Francisco M. Gonzalez-Longatt, Selvanayaki Kolandapalayam Shanmugam, Ivan Izonin
PublisherSpringer Science and Business Media Deutschland GmbH
Pages311-320
Number of pages10
ISBN (Print)9789811621253
DOIs
Publication statusPublished - 2022
EventInternational Conference on Expert Clouds and Applications, ICOECA 2021 - Bangalore
Duration: 18-02-202119-02-2021

Publication series

NameLecture Notes in Networks and Systems
Volume209
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceInternational Conference on Expert Clouds and Applications, ICOECA 2021
CityBangalore
Period18-02-2119-02-21

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Preprocessing of Datasets Using Sequential and Parallel Approach: A Comparison'. Together they form a unique fingerprint.

Cite this