You are not logged in | Log in

Feature selection with RAFS and STIG for knowledge discovery and machine learning model building in binary classification tasks

Speaker(s)
Radosław Piliszek
Affiliation
Uniwersytet w Białymstoku
Date
June 2, 2023, 4:15 p.m.
Information about the event
4060 & online: meet.google.com/jbj-tdsr-aop
Seminar
Seminar Intelligent Systems

This presentation is to report PhD thesis results ("Development of methods for feature selection based on information theory").
Dimensionality reduction is an important step in knowledge discovery and machine learning.
This study is focused on the feature selection branch of dimensionality reduction since it preserves the original, interpretable features, what is crucial for certain applications (e.g., biomedical).
Furthermore, the goal is to find the smallest (minimal-optimal) set of most informative features that generalise to the population in binary classification datasets of tens of thousands of features with high levels of correlations between the features.
Additionally, the method is meant not to consult the machine learning model feedback, thus remaining model-neutral.
To this end, a novel feature selection method (Robust Aggregative Feature Selection - RAFS) and a supplementary feature dissimilarity measure (Symmetric Target Information Gain - STIG) considering the binary decision variable are proposed.
The proposed method utilises cross-validation, all-relevant feature selection filtering, feature clustering, feature ranking and top feature popularity counting.
The proposed similarity measure is rooted in information theory.
The method is applied in external cross-validation on a real-world dataset, presenting diverse measures of dissimilarity between the features.
The feature selection results are validated using the AUC metric obtained from machine learning models built on the selected features as well as using feature selection stability measures (Jaccard index, Kuncheva index, Consistency Score).
The method is compared against state-of-the-art methods (RFE and mRMR) and is shown to achieve the highest AUC values with the smallest number of features selected and the highest stability scores.