TY - GEN
T1 - A comparative Analysis of ensemble methods and their efficiency in the classification of 'HIT AND RUN' cases in an imbalanced dataset (Traffic Crashes) with and without using "Synthetic minority oversampling technique"
AU - Manjula Shenoy, K.
AU - Prabhu, R. Srinivas
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In some cases of developing a predictive model for a classification problem, the distribution of classes is skewed, meaning a majority of the training samples belong to a single class and a very small number of samples belong to a minority class. If Machine Learning techniques are applied directly to such data sets, it may result in a model which either ignores the minority class as a whole or gives less importance to the samples belonging to the minority class. Such datasets are often called imbalanced datasets. There are multiple ways to deal with these problems some include Random Over Sampling, Random Under Sampling, Threshold value tuning, one class classification to find the outlier, etc. One such technique is called the "Synthetic minority oversampling technique (SMOTE)". In this paper we are going to analyze the efficiency of ensemble methods for the classification of imbalanced datasets with and without using SMOTE. The Dataset used for this purpose is an open-source Traffic crashes dataset available on the Chicago data portal. The target variable for our study is the "HIT-AND-RUN-I"column available in the dataset. This is a dichotomous variable that has 2 valid values. "Y"for the cases that are of positive outcomes as in "HIT-AND-RUN-I"= yes. "N"for the cases that are of negative outcomes as in "HIT-AND-RUN-I"= No.
AB - In some cases of developing a predictive model for a classification problem, the distribution of classes is skewed, meaning a majority of the training samples belong to a single class and a very small number of samples belong to a minority class. If Machine Learning techniques are applied directly to such data sets, it may result in a model which either ignores the minority class as a whole or gives less importance to the samples belonging to the minority class. Such datasets are often called imbalanced datasets. There are multiple ways to deal with these problems some include Random Over Sampling, Random Under Sampling, Threshold value tuning, one class classification to find the outlier, etc. One such technique is called the "Synthetic minority oversampling technique (SMOTE)". In this paper we are going to analyze the efficiency of ensemble methods for the classification of imbalanced datasets with and without using SMOTE. The Dataset used for this purpose is an open-source Traffic crashes dataset available on the Chicago data portal. The target variable for our study is the "HIT-AND-RUN-I"column available in the dataset. This is a dichotomous variable that has 2 valid values. "Y"for the cases that are of positive outcomes as in "HIT-AND-RUN-I"= yes. "N"for the cases that are of negative outcomes as in "HIT-AND-RUN-I"= No.
UR - https://www.scopus.com/pages/publications/105004792613
UR - https://www.scopus.com/pages/publications/105004792613#tab=citedBy
U2 - 10.1109/ICCTA60978.2023.10969264
DO - 10.1109/ICCTA60978.2023.10969264
M3 - Conference contribution
AN - SCOPUS:105004792613
T3 - 33rd International Conference on Computer Theory and Applications, ICCTA 2023
SP - 199
EP - 204
BT - 33rd International Conference on Computer Theory and Applications, ICCTA 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 33rd International Conference on Computer Theory and Applications, ICCTA 2023
Y2 - 16 December 2023 through 18 December 2023
ER -