Feature Selection for Supervised Learning
It has recently been showed that feature selection in supervised learning can be embedded in the learning algorithm by using sparsity-promoting priors/penalties that encourage the coefficient estimates to be either significantly large or exactly zero. In the first half of this talk, I will review this type of approach (which includes the well-known LASSO criterion for regression) and present some recent developments: (i) simple and efficient algorithms (with both parallel and sequential updates) for both binary and multi-class problems; (ii) generalization bounds; (iii) feature selection "inside" the kernel for kernel-based formulations. Experimental results (on standard benchmark data-sets and also on gene expression data) reveal that this class of methods achieves state-of-the-art performance.