Feature selection for Bayesian network classifiers using the MDL-FS scoreDrugan, M. M. & Wiering, M. A., Jul-2010, In : International Journal of Approximate Reasoning. 51, 6, p. 695-717 23 p.
Research output: Contribution to journal › Article › Academic › peer-review
When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features for such classifiers. To this end, we propose a new definition of the concept of redundancy in noisy data. For comparing alternative classifiers, we use the Minimum Description Length for Feature Selection (MDL-FS) function that we introduced before. Our function differs from the well-known MDL function in that it captures a classifier's conditional log-likelihood. We show that the MDL-FS function serves to identify redundancy at different levels and is able to eliminate redundant features from different types of classifier. We support our theoretical findings by comparing the feature-selection behaviours of the various functions in a practical setting. Our results indicate that the MDL-FS function is more suited to the task of feature selection than MDL as it often yields classifiers of equal or better performance with significantly fewer attributes. (C) 2010 Elsevier Inc. All rights reserved.
|Number of pages||23|
|Journal||International Journal of Approximate Reasoning|
|Publication status||Published - Jul-2010|
- Feature subset selection, Minimum Description Length, Selective Bayesian classifiers, Tree augumented networks, FEATURE SUBSET-SELECTION, PATTERN-RECOGNITION, MUTUAL INFORMATION, ALGORITHMS, CLASSIFICATION, OPTIMALITY, MODELS