There are several improvement methods are available to improve decision tree performance in terms of accuracy, and modelling time. Since experimenting with every available method is impossible, some of the methods are selected that are proven to increase decision tree performances. Selected improvement methods and their experimental setups are presented in this chapter.
4.1 Correlation-Based Feature Selection
Feature selection is a method used for reducing number of dimensions of a dataset by removing irrelevant and redundant attributes. Given a set of attributes F and a target class C, goal of feature selection is to find a minimum set of F that will yield highest accuracy (for C) for the classification task. Although
…show more content…
Also, method is performing well for C4.5 algorithm is likely to perform well for ID3 algorithm. Previous studies show that CFS method increases accuracy for CART algorithm although not as much as the C4.5 algorithm does (Doraisamy et al., 2008).
CFS uses a search algorithm and feature evaluation algorithm which uses a heuristic that measures "goodness" of attributes subsets. Hall and Smith (1998) define this goodness heuristic as "Good feature subsets contain features highly correlated with the class, yet uncorrelated with each other." Equation 1 below shows heuristic formula. G_x=(k¯(r_ci ))/√(k+k(k-1)¯(r_ii ' ))
Where G_x is the heuristic of goodness of an attribute subset x that contains k features, ¯(r_ci ) is average attribute-class correlation which points predictive power of the attribute subset to a class, and ¯(r_ii ' ) is average attribute inter-correlation that indicates the redundancy among attributes.
A version of correlation-based attribute selection to be included in experiment setup is called Fast Correlation-Based Feature Selection (FCBF) that initially developed by Yu and Liu (2004). This algorithm is preferred over other available correlation-based attribute selection algorithms since while other implementations of CFS using forward-sequential or greedy search methods (e.g. MRMR/CFS developed by Schoewe,
Feature selection is a dimensionality reduction technique widely used and it allows elimination of (irrelevant/redundant) features, whilst retaining the underlying discriminatory information, feature selection implies less data
AdaBoost and C4.5 decision tree. The first classifier used is Naïve Bayes it is a probabilistic classifier
Decision Trees are useful tools for helping you to choose between several courses of action.
Replacing this manual process with machine learning tools offers automated optimization that saves tremendous amounts of time and labor, while it provides more accurate ranking.
For classification trees, phi coefficient, Gini Index, or “twoing” are the most commonly used approximation techniques.
Data warehouses, in contrast, are targeted for decision support. Historical, summarized and consolidated data is more important than detailed, individual records. Since data warehouses contain consolidated data, perhaps from several operational databases, over potentially long periods of time, they tend to be orders of magnitude larger than operational databases; enterprise data warehouses are projected to be hundreds of gigabytes to terabytes in size. The workloads are query intensive with mostly ad hoc, complex queries that can access millions of records and perform a lot of scans, joins, and aggregates. Query throughput and response times are more important than transaction throughput.
Feature selection is a step that finds the subset from the original feature set to some criteria of instance importance. In this paper the concept of group feature selection is review that specifies different approaches in this ground. This literature review examines the recent work of feature selection where feature posses certain group structure. The methods found for group feature selection have discussed here. The general approaches of group feature selection such as group lasso and sparse group lasso are described. The group lasso is an extension of Standard lasso which tend to perform group selection, if the group of features is selected then all the features in the groups are selected. Sparse group lasso takes the advantage of group lasso and produces an efficient solution with simultaneous between and within group. The group feature selection tries to minimize the redundant and irrelevant features from the group to decreases the computational time. The above dimensional categorization of group feature selection algorithm give a view of future challenges and research
For the smallest set containing only ten samples, 19 of the 23 possible feature selection algorithms completed processing (4 feature selection algorithms could not be completed due to the 10-fold cross-validation used). For those 19 feature selection algorithms, 585 classification models were generated (few of the ARFF files were empty for the lower feature thresholds due to the small number of samples). The 50-sample dataset completed 20 of the 23 possible feature selection algorithms, thereby generating 665 classification models. When using 100 samples, 20 of the 23 possible feature selection algorithms were completed, and subsequently utilized to generate 665 classification models. The 200-sample dataset provided 20 of the 23 possible
In this thesis, a machine learning algorithm called Boosted Decision Tree (BDT) is used as a particle identification (PID) classifier.
The problem of variable selection is always the center of statistical research. Some classical methods like best subset selection, forward selection and backward elimination are proposed to handing massive data set. These methods are basically based on the idea of grid search. For example, the best subset selection searches the optimal model by all possible combination of the predictors. The forward selection find the best model by adding one predictor at each time, while the backward elimination finds the best model by removing one predictor at each time. Although these methods are powerful and accurate, they also cost a lot. For instance, suppose our model has 10 predictors, if we run the best subset selection to get the optimal model, we need to compare 1024 combinations of the predictors. It is unrealistic to perform such method in our real life because it time consuming. As a result, more efficient methods to solve the problem of variable selection are urgently desired.
correlations among complex, structured and unstructured, historical and potential future data sets for Primarily, data mining deals with the analysis of data the purpose of predicting future events and accessing sets for identification of hidden patterns, trends and the attractiveness of the various courses of action. It is data values. Data miming in any line of business
The forward selection procedure used helped identify which variables were good predictors in the models. Linear regression has certain assumptions, so in order to not violate those assumptions, it is crucial to pick the best variables for the model. The best model we found using this procedure met all the assumptions and gave good prediction accuracies.
As we can see the accuracy of both the classifiers is intact even after removing many attributes. Here we can observe that performance of both the classifiers remains same, it neither improves nor degrade, this proves that the removed attribute didn't contribute to the performance of both the classifiers and didn't help classifiers in classifying the instances. Thus, such dimensions should be
Ensemble techniques have been more popular than single model [1]. In this technique more than one classifier is used for classification with higher efficiency. Each classifier in the classification model is trained on different data chunks. With the help of advanced data streaming technologies [2], we are now able to collect large volume of data for different application domains. For example credit card transaction, network traffic monitoring etc. the presence of irrelevant and redundant data slows down the learning algorithms [3] [4]. By removing or ignoring irrelevant and redundant feature, prediction performance and computational efficiency can be improved. Multiclass miner works with dynamic feature vector and detects novel classes. It is a combination of OLINDDA and FAE approach. OLINDDA and FAE are used to detect novel classes and to classify data chunks
The J48 classification algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. So, whenever it encounters a training set it identifies the attribute that discriminates the various instances in a most clear way. From the possible values, if there is an attribute with unique value and if it has the same value as that of target value then the branch is ended