To populate Scholarly, sign in here .

Journal

Title On Bayesian Feature Selection Procedure Applied to Regression Problem with HDD
Posted by Bernadette Tubo
Authors Valeriano, Aries P.; Tubo, B. F.
Publication date 2022
Journal The Mindanawan Journal of Mathematics (TMJM)
Volume 4
Issue 2
Pages 23-33
Publisher Department of Mathematics and Statistics, MSU-IIT
Abstract High-dimensional data (HDD) means that the number of features, p, are exceedingly high and only a few samples $n$, are available. Regression problem involves the understanding of how the response, y, depend simultaneously on some features x. Often, only a few x’s explain y, while the rest may only have a little or no influence at all to it. Moreover, most of the existing methodology on how the x’s are entered into a regression model is established on p<= n. This study investigates a recently introduced methodology called the Bayesian feature ranking (BFR) on its performance with respect to how well the data fit the regression model in the presence of HDD in the x's with y being continuous. The proposed methodology involves implementing a modified forward selection (MFS) procedure on the ranked features with different noise levels v infused on y via the BFR. MFS via BFR procedure allows the most top ranked features to be included in the model and addition of features to the model is done sequentially, with increment value d= 5. For baseline comparison, MFS procedure on unranked features is conducted and evaluation of the derived models will be based on the derived values of R^2, a statistic for model fit. Results showed that in both simulated and real dataset, MFS via BFR consistently gave higher R^2 than the baseline MFS, implying that the model derived via BFR using ranked features of x describe y much better than the model using unranked features of x.
Index terms / Keywords Bayesian feature ranking; forward selection; high-dimensional data
DOI https://journals.msuiit.edu.ph/tmjm/article/view/44