TY - GEN
T1 - A Nonparametric Feature Separability Measure and an Algorithm for Simulating Synthetic Feature Vectors
AU - Chetty, Chowtapalle Anuraag
AU - Simi, V. R.
AU - Joseph, Justin
AU - Venugopal, Vipin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Measures that quantitatively reflect the separability between feature sets of two classes are required to identify the determinant features and select hyper-parameters of feature extraction algorithms, in binary classification paradigms. State-of-the-art separability measures look for equality of distribution parameters of the feature sets and do not linearly quantify the level of overlap between them. Reliable algorithms for generating synthetic feature sets with known levels of overlap are required to test and compare the performance of the separability measures. A measure of separability of features between two classes termed Thresholding-based Classification Error Estimate (TCEE) and an algorithm for generating synthetic feature vectors for testing the feature separability measures are proposed in this paper. Pearson’s correlation coefficient (PCC) of the Bhattacharyya distance (BD), Relative Entropy (RE), p-value of Rank-sum test, Jeffries-Matusita (JM) distance and TCEE with the percentage of overlaps on synthetic feature sets of two distinct classes are −0.6429, −0.6428, 0.3780, −0.9881, and 1. A high value of Pearson’s correlation with the percentage of overlap justifies that the TCEE can accurately measure separability of feature sets of two classes.
AB - Measures that quantitatively reflect the separability between feature sets of two classes are required to identify the determinant features and select hyper-parameters of feature extraction algorithms, in binary classification paradigms. State-of-the-art separability measures look for equality of distribution parameters of the feature sets and do not linearly quantify the level of overlap between them. Reliable algorithms for generating synthetic feature sets with known levels of overlap are required to test and compare the performance of the separability measures. A measure of separability of features between two classes termed Thresholding-based Classification Error Estimate (TCEE) and an algorithm for generating synthetic feature vectors for testing the feature separability measures are proposed in this paper. Pearson’s correlation coefficient (PCC) of the Bhattacharyya distance (BD), Relative Entropy (RE), p-value of Rank-sum test, Jeffries-Matusita (JM) distance and TCEE with the percentage of overlaps on synthetic feature sets of two distinct classes are −0.6429, −0.6428, 0.3780, −0.9881, and 1. A high value of Pearson’s correlation with the percentage of overlap justifies that the TCEE can accurately measure separability of feature sets of two classes.
UR - https://www.scopus.com/pages/publications/85200474413
UR - https://www.scopus.com/pages/publications/85200474413#tab=citedBy
U2 - 10.1007/978-3-031-64359-0_30
DO - 10.1007/978-3-031-64359-0_30
M3 - Conference contribution
AN - SCOPUS:85200474413
SN - 9783031643583
T3 - Communications in Computer and Information Science
SP - 388
EP - 397
BT - Information Management - 10th International Conference, ICIM 2024, Revised Selected Papers
A2 - Li, Shuliang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 10th International Conference on Information Management, ICIM 2024
Y2 - 8 March 2024 through 10 March 2024
ER -