TY - JOUR
T1 - Wide-ranging approach-based feature selection for classification
AU - Bhuyan, Hemanta Kumar
AU - Saikiran, M.
AU - Tripathy, Murchhana
AU - Ravi, Vinayakumar
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022
Y1 - 2022
N2 - Feature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data sets. Several methods have been used to solve the feature selection problem for classification, but most are specific to be used only for a particular data set. Thus, this paper proposes wide-ranging approaches to solve maximum feature selection problems for data sets. The proposed algorithm analytically chooses the optimal feature for classification by utilizing mutual information (MI) and linear correlation coefficients (LCC). It considers linearly and nonlinearly dependent data features for the same. The proposed feature selection algorithm suggests various features used to build a substantial feature subset for classification, effectively reducing irrelevant features. Three different datasets are used to evaluate the performance of the proposed algorithm with classifiers which requires a higher degree of features to have better accuracy and a lower computational cost. We considered probability value (p value <0.05) for feature selection in experiments on different data sets, then the number of features is selected (such as 7, 5, and 6 features from mobile, heart, and diabetes data set, respectively). Various accuracy is considered with different classifiers; for example, classifier Nearest_Neighbors made accuracy such as 0.92225, 0.88333, 0.86250 for mobile, heart, and diabetes data sets, respectively. The proposed model is adequate as per the evaluation of several real-world data sets.
AB - Feature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data sets. Several methods have been used to solve the feature selection problem for classification, but most are specific to be used only for a particular data set. Thus, this paper proposes wide-ranging approaches to solve maximum feature selection problems for data sets. The proposed algorithm analytically chooses the optimal feature for classification by utilizing mutual information (MI) and linear correlation coefficients (LCC). It considers linearly and nonlinearly dependent data features for the same. The proposed feature selection algorithm suggests various features used to build a substantial feature subset for classification, effectively reducing irrelevant features. Three different datasets are used to evaluate the performance of the proposed algorithm with classifiers which requires a higher degree of features to have better accuracy and a lower computational cost. We considered probability value (p value <0.05) for feature selection in experiments on different data sets, then the number of features is selected (such as 7, 5, and 6 features from mobile, heart, and diabetes data set, respectively). Various accuracy is considered with different classifiers; for example, classifier Nearest_Neighbors made accuracy such as 0.92225, 0.88333, 0.86250 for mobile, heart, and diabetes data sets, respectively. The proposed model is adequate as per the evaluation of several real-world data sets.
UR - http://www.scopus.com/inward/record.url?scp=85142134841&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142134841&partnerID=8YFLogxK
U2 - 10.1007/s11042-022-14132-z
DO - 10.1007/s11042-022-14132-z
M3 - Article
AN - SCOPUS:85142134841
SN - 1380-7501
VL - 82
SP - 23277
EP - 23304
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 15
ER -