HW_7

.Rmd

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

07

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

Rmd

Pages

4

Uploaded by PresidentMonkey1089

Report
--- title: "HW 7" author: "Junyu Sui" output: pdf_document: number_sections: true df_print: paged --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warnings = FALSE, fig.align = 'center', eval = TRUE) ``` You can run the following code to prepare the analysis. ```{r} library(r02pro) #INSTALL IF NECESSARY library(tidyverse) #INSTALL IF NECESSARY library(MASS) library(tree) my_ahp <- ahp %>% dplyr::select(gar_car, liv_area, lot_area, oa_qual, sale_price) %>% na.omit() %>% mutate(type = factor(ifelse(sale_price > median(sale_price), "Expensive", "Cheap"))) tr_ind <- 1:(nrow(my_ahp)/20) my_ahp_train <- my_ahp[tr_ind, ] my_ahp_test <- my_ahp[-tr_ind, ] ``` Suppose we want to use tree, bagging, random forest, and boosting to predict `sale_price` and `type` using variables `gar_car`, `liv_area`, `lot_area`, and `oa_qual`. Please answer the following questions. 1. Predict `sale_price` (a continuous response) using the training data `my_ahp_train` with tree (with CV pruning), bagging, random forest, and boosting (with CV for selecting the number of trees to be used). For each method, compute the training and test MSE. (For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`) ```{r} sale.tree <- tree(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train) cv.sale <- cv.tree(sale.tree) bestsize <- cv.sale$size[which.min(cv.sale$dev)] sale.tree.prune <- prune.tree(sale.tree, best = bestsize) plot(sale.tree.prune) text(sale.tree.prune) prediction_train.tree <- predict(sale.tree.prune, newdata = my_ahp_train) mean((my_ahp_train$sale_price - prediction_train.tree)^2) prediction_test.tree <- predict(sale.tree.prune, newdata = my_ahp_test) mean((my_ahp_test$sale_price - prediction_test.tree)^2) ``` ```{r} library(randomForest) set.seed(1)
p <- ncol(my_ahp)-2 ##Setting mtry = p for bagging bag.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, mtry = p, importance=TRUE) bag.sale importance(bag.sale) varImpPlot(bag.sale) prediction_train.bag <- predict(bag.sale, newdata = my_ahp_train) mean((my_ahp_train$sale_price - prediction_train.bag)^2) prediction_test.bag <- predict(bag.sale,newdata = my_ahp_test) mean((my_ahp_test$sale_price - prediction_test.bag)^2) ``` ```{r} set.seed(1) rf.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train,importance=TRUE) prediction_train.rf <- predict(rf.sale, newdata = my_ahp_train) mean((my_ahp_train$sale_price - prediction_train.rf)^2) prediction_test.rf <- predict(rf.sale,newdata = my_ahp_test) mean((my_ahp_test$sale_price - prediction_test.rf)^2) ``` ```{r} library(gbm) set.seed(1) boost.sale <- gbm(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, distribution = "gaussian", n.trees = 5000, interaction.depth = 1, cv.folds = 5) summary(boost.sale) prediction_train.boost <- predict(boost.sale, newdata = my_ahp_train, n.trees =5000) mean((my_ahp_train$sale_price - prediction_train.boost)^2) prediction_test.boost <- predict(boost.sale, newdata = my_ahp_test, n.trees =5000) mean((my_ahp_test$sale_price - prediction_test.boost)^2) ``` \newpage 2. Predict `type` (a binary response) using the training data `my_ahp_train` with tree (with CV pruning), bagging, random forest, and boosting (with CV for selecting the number of trees to be used). For each method, compute the training and test classification error. (For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`) ```{r} type.tree <- tree(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, split="gini") set.seed(0) cv.type <- cv.tree(type.tree) cv.type_df <- data.frame(size = cv.type$size, deviance = cv.type$dev) best_size <- cv.type$size[which.min(cv.type$dev)] type.tree.prune <- prune.tree(type.tree, best=best_size) plot(type.tree.prune) text(type.tree.prune)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help