Feature selection for Bayesian network classifiers using the MDL-FS score

Drugan, M. M. & Wiering, M. A., Jul-2010, In : International Journal of Approximate Reasoning. 51, 6, p. 695-717 23 p.

Research output: Contribution to journalArticleAcademicpeer-review

When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features for such classifiers. To this end, we propose a new definition of the concept of redundancy in noisy data. For comparing alternative classifiers, we use the Minimum Description Length for Feature Selection (MDL-FS) function that we introduced before. Our function differs from the well-known MDL function in that it captures a classifier's conditional log-likelihood. We show that the MDL-FS function serves to identify redundancy at different levels and is able to eliminate redundant features from different types of classifier. We support our theoretical findings by comparing the feature-selection behaviours of the various functions in a practical setting. Our results indicate that the MDL-FS function is more suited to the task of feature selection than MDL as it often yields classifiers of equal or better performance with significantly fewer attributes. (C) 2010 Elsevier Inc. All rights reserved.

Original languageEnglish
Pages (from-to)695-717
Number of pages23
JournalInternational Journal of Approximate Reasoning
Issue number6
Publication statusPublished - Jul-2010


  • Feature subset selection, Minimum Description Length, Selective Bayesian classifiers, Tree augumented networks, FEATURE SUBSET-SELECTION, PATTERN-RECOGNITION, MUTUAL INFORMATION, ALGORITHMS, CLASSIFICATION, OPTIMALITY, MODELS

ID: 5110592