worksheet_regression2

.pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

DSCI100

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by CountKuduMaster478

Worksheet 9 - Regression Continued Lecture and Tutorial Learning Goals: By the end of the week, you will be able to: Recognize situations where a simple regression analysis would be appropriate for making predictions. Explain the -nearest neighbour ( -nn) regression algorithm and describe how it differs from k-nn classification. Interpret the output of a -nn regression. In a dataset with two variables, perform -nearest neighbour regression in R using tidymodels to predict the values for a test dataset. Using R, execute cross-validation in R to choose the number of neighbours. Using R, evaluate -nn regression prediction accuracy using a test data set and an appropriate metric ( e.g. , root means square prediction error). In a dataset with > 2 variables, perform -nn regression in R using tidymodels to predict the values for a test dataset. In the context of -nn regression, compare and contrast goodness of fit and prediction properties (namely RMSE vs RMSPE). Describe advantages and disadvantages of the -nearest neighbour regression approach. Perform ordinary least squares regression in R using tidymodels to predict the values for a test dataset. Compare and contrast predictions obtained from -nearest neighbour regression to those obtained using simple ordinary least squares regression from the same dataset. This worksheet covers parts of the Regression II chapter of the online textbook. You should read this chapter before attempting the worksheet. ### Run this cell before continuing. library ( tidyverse ) library ( repr ) library ( tidymodels ) library ( cowplot ) options ( repr.matrix.max.rows = 6 ) source ( "tests.R" ) source ( 'cleanup.R' ) Warm-up Questions Here are some warm-up questions on the topic of multiple regression to get you thinking before we jump into data analysis. The course readings should help you answer these. In [ ]:

Question 1.0 Multiple Choice: {points: 1} In multivariate k-nn regression with one outcome/target variable and two predictor variables, the predictions take the form of what shape? A. a flat plane B. a wiggly/flexible plane C. A straight line D. a wiggly/flexible line E. a 4D hyperplane F. a 4D wiggly/flexible hyperplane Save the letter of the answer you think is correct to a variable named answer1.0 . Make sure you put quotations around the letter and pay attention to case. ### BEGIN SOLUTION answer1.0 <- "B" ### END SOLUTION test_1.0 () Question 1.1 Multiple Choice: {points: 1} In simple linear regression with one outcome/target variable and one predictor variable, the predictions take the form of what shape? A. a flat plane B. a wiggly/flexible plane C. A straight line D. a wiggly/flexible line E. a 4D hyperplane F. a 4D wiggly/flexible hyperplane Save the letter of the answer you think is correct to a variable named answer1.1 . Make sure you put quotations around the letter and pay attention to case. ### BEGIN SOLUTION answer1.1 <- "C" ### END SOLUTION test_1.1 () In [ ]: In [ ]: In [ ]: In [ ]:

Question 1.2 Multiple Choice: {points: 1} In multiple linear regression with one outcome/target variable and two predictor variables, the predictions take the form of what shape? A. a flat plane B. a wiggly/flexible plane C. A straight line D. a wiggly/flexible line E. a 4D hyperplane F. a 4D wiggly/flexible hyperplane Save the letter of the answer you think is correct to a variable named answer1.2 . Make sure you put quotations around the letter and pay attention to case. ### BEGIN SOLUTION answer1.2 <- "A" ### END SOLUTION test_1.2 () Understanding Simple Linear Regression Consider this small and simple dataset: simple_data <- tibble ( X = c ( 1 , 2 , 3 , 6 , 7 , 7 ), Y = c ( 1 , 1 , 3 , 5 , 7 , 6 )) options ( repr.plot.width = 5 , repr.plot.height = 5 ) base <- ggplot ( simple_data , aes ( x = X , y = Y )) + geom_point ( size = 2 ) + scale_x_continuous ( limits = c ( 0 , 7.5 ), breaks = seq ( 0 , 8 ), minor_breaks = se scale_y_continuous ( limits = c ( 0 , 7.5 ), breaks = seq ( 0 , 8 ), minor_breaks = se theme ( text = element_text ( size = 20 )) base Now consider these three potential lines we could fit for the same dataset: options ( repr.plot.height = 3.5 , repr.plot.width = 10 ) line_a <- base + ggtitle ( "Line A" ) + geom_abline ( intercept = -0.897 , slope = 0.9834 , color = "blue" ) + theme ( text = element_text ( size = 20 )) line_b <- base + ggtitle ( "Line B" ) + geom_abline ( intercept = 0.1022 , slope = 0.9804 , color = "purple" ) + theme ( text = element_text ( size = 20 )) line_c <- base + ggtitle ( "Line C" ) + geom_abline ( intercept = -0.2347 , slope = 0.9164 , color = "green" ) + In [ ]: In [ ]: In [ ]: In [ ]:

theme ( text = element_text ( size = 20 )) plot_grid ( line_a , line_b , line_c , ncol = 3 ) Question 2.0 {points: 1} Use the graph below titled "Line A" to roughly calculate the average squared vertical distance between the points and the blue line. Read values of the graph to a precision of 0.25 (e.g. 1, 1.25, 1.5, 1.75, 2). Save your answer to a variable named answer2.0 . We reprint the plot for you in a larger size to make it easier to estimate the locations on the graph. #run this code options ( repr.plot.width = 9 , repr.plot.height = 9 ) line_a ### BEGIN SOLUTION answer2.0 <- (( 0 - 1 ) ^ 2 + ( 1 - 1 ) ^ 2 + ( 2 - 3 ) ^ 2 + ( 5 - 5 ) ^ 2 + ( 6 - 6 ) ^ 2 + ( 6 - 7 ### END SOLUTION answer2.0 test_2.0 () Question 2.1 {points: 1} Use the graph titled "Line B" to roughly calculate the average squared vertical distance between the points and the purple line. Read values of the graph to a precision of 0.25 (e.g. 1, 1.25, 1.5, 1.75, 2). Save your answer to a variable named answer2.1 . We reprint the plot for you in a larger size to make it easier to estimate the locations on the graph. options ( repr.plot.width = 9 , repr.plot.height = 9 ) line_b ### BEGIN SOLUTION answer2.1 <- (( 1 - 1 ) ^ 2 + ( 2 - 1 ) ^ 2 + ( 3 - 3 ) ^ 2 + ( 6 - 5 ) ^ 2 + ( 7 - 7 ) ^ 2 + ( 6 - 7 ### END SOLUTION answer2.1 test_2.1 () Question 2.2 {points: 1} Use the graph titled "Line C" to roughly calculate the average squared vertical distance between the points and the green line. Read values of the graph to a In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]:

precision of 0.25 (e.g. 1, 1.25, 1.5, 1.75, 2). Save your answer to a variable named answer2.2 . We reprint the plot for you in a larger size to make it easier to estimate the locations on the graph. options ( repr.plot.width = 9 , repr.plot.height = 9 ) line_c ### BEGIN SOLUTION answer2.2 <- (( 0.75 - 1 ) ^ 2 + ( 1.5 - 1 ) ^ 2 + ( 2.5 - 3 ) ^ 2 + ( 5.25 - 5 ) ^ 2 + ( 6.25 - ### END SOLUTION answer2.2 test_2.2 () Question 2.3 {points: 1} Based on your calculations above, which line would linear regression by ordinary least squares choose given our small and simple dataset? Line A, B or C? Assign the letter that corresponds the line to a variable named answer2.3 . Make sure you put quotations around the letter and pay attention to case. ### BEGIN SOLUTION answer2.3 <- "C" ### END SOLUTION test_2.3 () Marathon Training Revisited with Linear Regression! Source: https://media.giphy.com/media/BDagLpxFIm3SM/giphy.gif Remember our question from last week: what features predict whether athletes will perform better than others? Specifically, we are interested in marathon In [ ]: In [ ]: In [ ]: In [ ]: In [ ]:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recommended textbooks for you

Elementary Linear Algebra (MindTap Course List)

Algebra

ISBN:9781305658004

Author:Ron Larson

Publisher:Cengage Learning

Functions and Change: A Modeling Approach to Coll...

Algebra

ISBN:9781337111348

Author:Bruce Crauder, Benny Evans, Alan Noell

Publisher:Cengage Learning

College Algebra

Algebra

ISBN:9781305115545

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

College Algebra

Algebra

ISBN:9781337282291

Author:Ron Larson

Publisher:Cengage Learning

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

SEE MORE TEXTBOOKS

Recommended textbooks for you

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning
Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
College Algebra
Algebra
ISBN:9781337282291
Author:Ron Larson
Publisher:Cengage Learning
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax