HW_7
.Rmd
keyboard_arrow_up
School
University of Wisconsin, Madison *
*We aren’t endorsed by this school
Course
07
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Rmd
Pages
4
Uploaded by PresidentMonkey1089
---
title: "HW 7"
author: "Junyu Sui"
output:
pdf_document:
number_sections: true
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE,
warnings = FALSE, fig.align = 'center',
eval = TRUE)
```
You can run the following code to prepare the analysis.
```{r}
library(r02pro)
#INSTALL IF NECESSARY
library(tidyverse)
#INSTALL IF NECESSARY
library(MASS)
library(tree)
my_ahp <- ahp %>% dplyr::select(gar_car, liv_area, lot_area, oa_qual, sale_price)
%>%
na.omit() %>%
mutate(type = factor(ifelse(sale_price > median(sale_price), "Expensive",
"Cheap")))
tr_ind <- 1:(nrow(my_ahp)/20)
my_ahp_train <- my_ahp[tr_ind, ]
my_ahp_test <- my_ahp[-tr_ind, ]
```
Suppose we want to use tree, bagging, random forest, and boosting to predict
`sale_price` and `type` using variables `gar_car`, `liv_area`, `lot_area`, and
`oa_qual`. Please answer the following questions.
1. Predict `sale_price` (a continuous response) using the training data
`my_ahp_train` with tree (with CV pruning), bagging, random forest, and boosting
(with CV for selecting the number of trees to be used). For each method, compute
the training and test MSE.
(For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`)
```{r}
sale.tree <- tree(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train)
cv.sale <- cv.tree(sale.tree)
bestsize <- cv.sale$size[which.min(cv.sale$dev)]
sale.tree.prune <- prune.tree(sale.tree, best = bestsize)
plot(sale.tree.prune)
text(sale.tree.prune)
prediction_train.tree <- predict(sale.tree.prune, newdata = my_ahp_train)
mean((my_ahp_train$sale_price - prediction_train.tree)^2)
prediction_test.tree <- predict(sale.tree.prune, newdata = my_ahp_test)
mean((my_ahp_test$sale_price - prediction_test.tree)^2)
```
```{r}
library(randomForest)
set.seed(1)
p <- ncol(my_ahp)-2
##Setting mtry = p for bagging
bag.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train, mtry = p, importance=TRUE)
bag.sale
importance(bag.sale)
varImpPlot(bag.sale)
prediction_train.bag <- predict(bag.sale, newdata = my_ahp_train)
mean((my_ahp_train$sale_price - prediction_train.bag)^2)
prediction_test.bag <- predict(bag.sale,newdata = my_ahp_test)
mean((my_ahp_test$sale_price - prediction_test.bag)^2)
```
```{r}
set.seed(1)
rf.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train,importance=TRUE)
prediction_train.rf <- predict(rf.sale, newdata = my_ahp_train)
mean((my_ahp_train$sale_price - prediction_train.rf)^2)
prediction_test.rf <- predict(rf.sale,newdata = my_ahp_test)
mean((my_ahp_test$sale_price - prediction_test.rf)^2)
```
```{r}
library(gbm)
set.seed(1)
boost.sale <- gbm(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train, distribution = "gaussian", n.trees = 5000, interaction.depth = 1,
cv.folds = 5)
summary(boost.sale)
prediction_train.boost <- predict(boost.sale, newdata = my_ahp_train, n.trees
=5000)
mean((my_ahp_train$sale_price - prediction_train.boost)^2)
prediction_test.boost <- predict(boost.sale, newdata = my_ahp_test, n.trees =5000)
mean((my_ahp_test$sale_price - prediction_test.boost)^2)
```
\newpage
2. Predict `type` (a binary response) using the training data `my_ahp_train` with
tree (with CV pruning), bagging, random forest, and boosting (with CV for selecting
the number of trees to be used). For each method, compute the training and test
classification error.
(For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`)
```{r}
type.tree <- tree(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train,
split="gini")
set.seed(0)
cv.type <- cv.tree(type.tree)
cv.type_df <- data.frame(size = cv.type$size, deviance = cv.type$dev)
best_size <- cv.type$size[which.min(cv.type$dev)]
type.tree.prune <- prune.tree(type.tree, best=best_size)
plot(type.tree.prune)
text(type.tree.prune)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help