p1_assess_learners_report

.pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

7646

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by AdmiralWaterBuffalo3874

CS 7646 Spring2023 Project 3: Assess Learners Abstract- This report studies classic decision tree learner, random tree learner, bootstrap aggregating learner, an insane learner with certain bagging learner circumstance. Learners are compared to investigate their prediction powers. Introduction Classic decision tree algorithm was first given in JR Quinlan’s paper. Data is split up into two leaves from the root node with highest absolute correlation as splitting criteria, then the same method is applied to left and right leaf recursively through the whole decision tree building. Random tree algorithm building method is same to decision tree, but the data is randomly split through the whole tree building process. Bagging learner is ensemble method, it aggregates different results of a learner based on different sample of a same data, and the sample was draw from the data with replacement. Random tree is the fastest learner to build a tree than other learners. Methods Decision tree, random tree and bagging learner algorithm are aggregated in the python file testlearner.py. Istanbul.cvs is used as data to compare those three learners, data is divided into in sample and out sample data with assignment 60% and 40%. Training data is trained by build_tree() method in each learner to build algorithm. Query() method in each learner forecast the result of training data by trees. Discussion Experiment 1 • Does overfitting occur with respect to leaf_size? • For which values of leaf_size does overfitting occur? Indicate the starting point and the direction of overfitting. Support your answer in the discussion or analysis. Use RMSE as your metric for assessing overfitting. Overfitting occurs with respect to the leaf size. Overfitting occurs at when its leaf size under 10 from right to left. RMSE of out sample and in sample increases from leaf 1 to leaf 10, and RMSE of out sample and in sample decline when the leaf size is over 10, and the RMSE of in sample exceeds out sample when the leaf size increases to 10, so the overfitting occurs at leaf 10.

Experiment 2 • Can bagging reduce overfitting with respect to leaf_size? • Can bagging eliminate overfitting with respect to leaf_size? Overfitting can be reduced by bagging with respect to leaf_size, but cannot be eliminated. Overfitting in baglearner with 20 bags provides a good example to explain it. Overfitting occurs from leaf 1 to leaf 10, the overfitting direction is from right to leaf as the leaf size decreasing because the RMSE of out sample declines at that region. When the leaf size increase from left to right (greater than 10), the RMSE of in sample and out sample both increase. Therefore, the leaf size can efficiently reduce overfitting, but not eliminating it.

Experiment 3 • In which ways is one method better than the other? • Which learner had better performance (based on your selected measures) and why do you think that was the case? • Is one learner likely to always be superior to another (why or why not)? Training execution time and Mean Absolute Error are adopted as metrics to judge DTleaner and RTlearner performance. It apparently RTlearner takes fewer time to build tree and train models than DTleaner. The high efficiency of RTleanrers is because RTlearner randomly chooses split value, but DTlearner uses strictly criteria to pick split value. DTlearner shows strong performance on MAE because MAE of DTlearner is always lower than RTleaner across all leaf size. The strictly split condition gives DTleaner higher accuracy.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version