A comparative Analysis of ensemble methods and their efficiency in the classification of 'HIT AND RUN' cases in an imbalanced dataset (Traffic Crashes) with and without using "Synthetic minority oversampling technique"

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In some cases of developing a predictive model for a classification problem, the distribution of classes is skewed, meaning a majority of the training samples belong to a single class and a very small number of samples belong to a minority class. If Machine Learning techniques are applied directly to such data sets, it may result in a model which either ignores the minority class as a whole or gives less importance to the samples belonging to the minority class. Such datasets are often called imbalanced datasets. There are multiple ways to deal with these problems some include Random Over Sampling, Random Under Sampling, Threshold value tuning, one class classification to find the outlier, etc. One such technique is called the "Synthetic minority oversampling technique (SMOTE)". In this paper we are going to analyze the efficiency of ensemble methods for the classification of imbalanced datasets with and without using SMOTE. The Dataset used for this purpose is an open-source Traffic crashes dataset available on the Chicago data portal. The target variable for our study is the "HIT-AND-RUN-I"column available in the dataset. This is a dichotomous variable that has 2 valid values. "Y"for the cases that are of positive outcomes as in "HIT-AND-RUN-I"= yes. "N"for the cases that are of negative outcomes as in "HIT-AND-RUN-I"= No.

    Original languageEnglish
    Title of host publication33rd International Conference on Computer Theory and Applications, ICCTA 2023
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages199-204
    Number of pages6
    ISBN (Electronic)9798350309577
    DOIs
    Publication statusPublished - 2023
    Event33rd International Conference on Computer Theory and Applications, ICCTA 2023 - Alexandria, Egypt
    Duration: 16-12-202318-12-2023

    Publication series

    Name33rd International Conference on Computer Theory and Applications, ICCTA 2023

    Conference

    Conference33rd International Conference on Computer Theory and Applications, ICCTA 2023
    Country/TerritoryEgypt
    CityAlexandria
    Period16-12-2318-12-23

    All Science Journal Classification (ASJC) codes

    • Computer Science Applications
    • Computer Vision and Pattern Recognition
    • Software
    • Artificial Intelligence
    • Computational Theory and Mathematics
    • Information Systems and Management

    Fingerprint

    Dive into the research topics of 'A comparative Analysis of ensemble methods and their efficiency in the classification of 'HIT AND RUN' cases in an imbalanced dataset (Traffic Crashes) with and without using "Synthetic minority oversampling technique"'. Together they form a unique fingerprint.

    Cite this