Loading...
Thumbnail Image
Publication

Bootstrap-based Aggregations and their Stability in Feature Selection

Salman, Reem Elfatih
Date
2022-06
Type
Thesis
Degree
Description
A Master of Science thesis in Mathematics by Reem Elfatih Salman entitled, “Bootstrap-based Aggregations and their Stability in Feature Selection”, submitted in June 2022. Thesis advisor is Dr. Ayman Alzaatreh and thesis co-advisor is Dr. Hana Sulieman. Soft copy is available (Thesis, Approval Signatures, Completion Certificate, and AUS Archives Consent Form).
Abstract
With the rapid development of technology and the Internet, datasets have grown increasingly larger in size and dimensionality. As a result, feature selection has become a critical reprocessing tool in machine learning applications, as well as the subject of a plethora of research in a variety of fields. However, a common concern in feature selection is that different approaches can give very different results when applied to similar datasets. Aggregating the results of feature selection methods can help resolve this concern and control the diversity of selected feature subsets. In this work, we develop a general framework for the ensemble of different feature selection methods. Based on diversified datasets generated from the original set of observations, we aggregate within and between the importance scores generated by different feature selection techniques. The thesis goes into detail about the framework and its validation on prominent real-world datasets, using experimental analysis to show how aggregating multiple feature selection methods affects the learning algorithm’s performance while identifying the optimal and most appropriate feature subset for a given dataset. In further contribution to this field, this thesis also examines the stability of the aggregation process that influences the stability of the feature selection algorithm. Correspondingly, different aggregation approaches are evaluated and compared using datasets from a variety of application fields, in terms of both the classification performance and the stability. The results are meant to emphasize the variations in aggregation approaches and highlight the role of the aggregation procedure in affecting feature selection robustness.
External URI
Collections