Improving Decision Tree Performance Methods

1479 Words6 Pages
There are several improvement methods are available to improve decision tree performance in terms of accuracy, and modelling time. Since experimenting with every available method is impossible, some of the methods are selected that are proven to increase decision tree performances. Selected improvement methods and their experimental setups are presented in this chapter. 4.1 Correlation-Based Feature Selection Feature selection is a method used for reducing number of dimensions of a dataset by removing irrelevant and redundant attributes. Given a set of attributes F and a target class C, goal of feature selection is to find a minimum set of F that will yield highest accuracy (for C) for the classification task. Although…show more content…
Also, method is performing well for C4.5 algorithm is likely to perform well for ID3 algorithm. Previous studies show that CFS method increases accuracy for CART algorithm although not as much as the C4.5 algorithm does (Doraisamy et al., 2008). CFS uses a search algorithm and feature evaluation algorithm which uses a heuristic that measures "goodness" of attributes subsets. Hall and Smith (1998) define this goodness heuristic as "Good feature subsets contain features highly correlated with the class, yet uncorrelated with each other." Equation 1 below shows heuristic formula. G_x=(k¯(r_ci ))/√(k+k(k-1)¯(r_ii ' )) Where G_x is the heuristic of goodness of an attribute subset x that contains k features, ¯(r_ci ) is average attribute-class correlation which points predictive power of the attribute subset to a class, and ¯(r_ii ' ) is average attribute inter-correlation that indicates the redundancy among attributes. A version of correlation-based attribute selection to be included in experiment setup is called Fast Correlation-Based Feature Selection (FCBF) that initially developed by Yu and Liu (2004). This algorithm is preferred over other available correlation-based attribute selection algorithms since while other implementations of CFS using forward-sequential or greedy search methods (e.g. MRMR/CFS developed by Schoewe,
Open Document