Supplement
.Rmd
keyboard_arrow_up
School
Wake Tech *
*We aren’t endorsed by this school
Course
320
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Rmd
Pages
3
Uploaded by MateScience9335
---
title: "Supplement for Modeling 4"
author: "Mario Giacomazzo"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,warning=F)
options(scipen=999)
library(tidyverse) #Essential Functions
library(modelr) #Helpful Functions in Modeling
library(purrr)
library(broom)
DATA=read_csv("AirWaterTemp.csv",col_types=cols()) #River Data
```
# Introduction
We will continue our work with daily water temperature and air temperature data observed for `r length(unique(DATA$L))` rivers in Spain. In the preview below, `W` represents the daily maximum water temperature and `A` represents the daily maximum
air temperature. The data contains almost a full year of data for each of the `r length(unique(DATA$L))` different rivers.
```{r,echo=F}
head(DATA)
```
Using the data, we seek to identify the best model for predicting the maximum water
temperature given the maximum air temperature. Previously, we randomly selected 3 rivers to act as a test set. All models were evaluated based on the randomly selected test set. In this tutorial, we explore approaches that ensure that all data is used for both model training and model testing.
In this tutorial, we apply helpful functions in the `purrr`, `modelr`, and `broom` packages. See the following links for helpful articles on performing cross-
validation within the tidyverse: [Link 1]
(http://sjspielman.org/bio5312_fall2017/files/kfold_supplement.pdf), [Link 2]
(https://drsimonj.svbtle.com/k-fold-cross-validation-with-modelr-and-broom), and [Link 3](https://www.r-bloggers.com/easy-cross-validation-in-r-with-modelr/).
#Part 1: Intelligent Use of Locations for Cross-Validation
##Chunk 1: List-Column of Data Split By Location
```{r,eval=T,message=F}
NEST.DATA = DATA %>% group_by(L) %>% nest()
head(NEST.DATA)
```
##Chunk 2: Combining `filter()` with `unnest()` To Split Data
```{r,eval=T,message=F}
NEST.DATA %>% filter(L==103) %>% unnest() %>% glimpse()
NEST.DATA %>% filter(L!=103) %>% unnest() %>% glimpse()
```
##Chunk 3: Fit Train Data, Predict Test Data, and Save Results
```{r,eval=T}
DATA2=DATA
DATA2$linpred=NA
TEST = NEST.DATA %>% filter(L==103) %>% unnest()
TRAIN = NEST.DATA %>% filter(L!=103) %>% unnest()
linmod=lm(W~A,data=TRAIN)
linmodpred=predict(linmod,newdata=TEST)
DATA2$linpred[which(DATA2$L==103)]=linmodpred
head(DATA2)
```
##Chunk 4: Create a Loop to Iterate Process for Each Location
```{r,eval=F}
DATA2=DATA
DATA2$linpred=NA
for (i in unique(DATA2$L)){
TEST = NEST.DATA %>% filter(L==i) %>% unnest()
TRAIN = NEST.DATA %>% filter(L!=i) %>% unnest()
linmod=lm(W~A,data=TRAIN)
linmodpred=predict(linmod,newdata=TEST)
DATA2$linpred[which(DATA2$L==i)]=linmodpred
}
```
##Chunk 5: Calcuate Cross-Validated RMSE
```{r,eval=F}
RMSE.func=function(actual,predict){
resid = actual - predict
mse = mean(resid^2,na.rm=T)
rmse = sqrt(mse)
return(rmse)
}
RMSE.func(actual=DATA2$W,predict=DATA2$linpred)
```
#Part 2: K-Fold CV for Polynomial Model Evaluation
##Chunk 1: Exploratory Figures
```{r,echo=F,eval=F}
ggplot(data=DATA) +
geom_point(aes(x=JULIAN_DAY,y=W,color=A),alpha=0.3) + xlab("Day of Year") + ylab("Max Water Temperature") +
guides(color=guide_legend(title="Max Air \nTemperature")) +
theme_minimal()
```
##Chunk 2: Polynomial Fitting
```{r,eval=F}
polymodel=lm(W~poly(A,4)+poly(JULIAN_DAY,3),data=na.omit(DATA))
tidy(polymodel)
glance(polymodel)
```
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help