Improving Decision Tree Performance Methods

1479 Words Sep 2nd, 2014 6 Pages
There are several improvement methods are available to improve decision tree performance in terms of accuracy, and modelling time. Since experimenting with every available method is impossible, some of the methods are selected that are proven to increase decision tree performances. Selected improvement methods and their experimental setups are presented in this chapter.

4.1 Correlation-Based Feature Selection

Feature selection is a method used for reducing number of dimensions of a dataset by removing irrelevant and redundant attributes. Given a set of attributes F and a target class C, goal of feature selection is to find a minimum set of F that will yield highest accuracy (for C) for the classification task. Although classification type decision trees has built-in feature selection mechanisms, it is claimed that applying prior feature selection before modelling with decision trees is a useful practice ( Doraisamy et al., 2008). Studies state feature selection has numerous of advantages that are as follows: Reducing dimensionality of data reduces hypothesis space thus resulting in: Faster execution time Less computational power needs Provides better understanding of the domain Improves performance of classification algorithms for particular feature selection mechanisms for individual problems. May help overfitting problem (See Appendix 5 for overfitting problem details).

Performance improvement of feature selection process highly depends on the…
Open Document