Evaluation of the importance of data pre-processing order when combining feature selection and data sampling

Ahmad Abu Shanab; Taghi M Khoshgoftaar; Randall Wald; Jason Van Hulse

doi:10.1504/IJBIDM.2012.048730

Back

Evaluation of the importance of data pre-processing order when combining feature selection and data sampling

Journal article

Peer reviewed

Evaluation of the importance of data pre-processing order when combining feature selection and data sampling

Ahmad Abu Shanab, Taghi M Khoshgoftaar, Randall Wald and Jason Van Hulse

International journal of business intelligence and data mining, Vol.7(1-2), pp.116-134

01/01/2012

DOI: https://doi.org/10.1504/IJBIDM.2012.048730

Abstract

COMPUTING JOURNALS

Two problems often encountered in machine learning are class imbalance and high dimensionality. In this paper we compare three different approaches for addressing both problems simultaneously, by applying both data sampling and feature selection. With the first two approaches, sampling is followed by feature selection. In the first approach, the features are selected based on the sampled data, and then the unsampled data is used with just the selected features. The second approach is similar, but the sampled data is used. Finally, with the third approach, feature selection is performed prior to sampling. To compare the approaches, we use seven datasets from different domains, employ nine feature rankers from three different families, apply three sampling techniques, and inject class noise to better simulate real-world datasets. The results show that the second and third approaches are both very good, with the third approach showing a slight (but not statistically significant) lead.

View Online

Metrics

13 Record Views

Details

Title: Evaluation of the importance of data pre-processing order when combining feature selection and data sampling
Creators: Ahmad Abu Shanab - Florida Atlantic University
Taghi M Khoshgoftaar - Florida Atlantic University
Randall Wald - Florida Atlantic University
Jason Van Hulse - Florida Atlantic University
Publication Details: International journal of business intelligence and data mining, Vol.7(1-2), pp.116-134
Publisher: Inderscience Publishers
Identifiers: 991004106072606311
Academic Unit: University of La Verne
Resource Type: Journal article

Evaluation of the importance of data pre-processing order when combining feature selection and data sampling

Abstract

View Online

Metrics

Details

University of La Verne Social media