HWK10_Soln

.pdf

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

371

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by UltraDolphinMaster987

Stat 371 Homework #10 SOLUTIONS *Submit your homework to Canvas by the due date and time. Email your instructor if you have extenuating circumstances and need to request an extension. *If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions. *If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly. *You must include an explanation and/or intermediate calculations for an exercise to be complete. *Be sure to submit the HWK 10 Auto grade Quiz which will give you ~20 of your 40 accuracy points. *50 points total: 40 points accuracy, and 10 points completion Least Squares Linear Regression Exercise 1. Suppose we are interested in exploring the relationship between city air particulate and rates of childhood asthma. We sample 15 cities for particulate (X) measured in parts-per-million (ppm) of large particulate matter and for the rate of childhood asthma (Y) measured in percents. The data is given in the summary table and R vectors below. variable size mean variance X 15 11.42 13.05 Y 15 14.513 2.636 a. Plot the data as you see fit and summarize the pattern’s shape, direction, and strength in the context of the problem. There appears to be a strong, positive, linear pattern between level of particulate and percent asthma for the range of values observed in this data set. particulate <- c( 11.6 , 15.9 , 15.7 , 7.9 , 6.3 , 13.7 , 13.1 , 10.8 , 6.0 , 7.6 , 14.8 , 7.4 , 16.2 , 13.1 , 11.2 ) asthma <- c( 14.5 , 16.6 , 16.5 , 12.6 , 12.0 , 15.8 , 15.1 , 14.2 , 12.2 , 13.1 , 16.0 , 12.9 , 16.4 , 15.4 , 14.4 ) plot(particulate, asthma) 1
6 8 10 12 14 16 12 13 14 15 16 particulate asthma b. Calculate the correlation coefficient (you can use an R function) and explain how the value corresponds to what you observed in the graph in part (a). r = 0.993 calculated in R. This value is very close to 1, which is not surprising based on how tight and linear the x, y data points were in the scatterplot. cor(particulate, asthma) ## [1] 0.9931873 c. Build a linear regression model with least squares estimators for slope and y intercept for the data (i) First, build a regression model by hand using the correlation computed in (b) and summary statistics given above. Our estimated slope is given by ˆ β 1 = r s y s x = 0 . 993 2 . 636 13 . 05 = 0 . 446 We can find the intercept as ¯ y ˆ β 1 ¯ x which in this case is 14 . 513 0 . 446(11 . 42) = 9 . 42 . Our linear model is y = 0 . 446 x + 9 . 42 (ii) Check your computations using lm in R. asthma_mod <- lm(asthma ~ particulate) asthma_mod ## ## Call: ## lm(formula = asthma ~ particulate) ## ## Coefficients: ## (Intercept) particulate 2
## 9.4163 0.4463 (iii) Interpret the estimated intercept and slope in the context of the question. The estimated intercept is 9.42 (this is the estimated average rate of asthma in cities with 0 particulate - this value is outside the range of our data). The estimated slope is 0.446. This suggests that for each unit increase in particulate, measured cities will tend to exhibit an increase of 0.446 percentage points in the rate of childhood asthma on average. d. Construct a residual plot of fitted y values on the x axis and residuals on the y. Graphically assess whether the correct model and constant variance assumptions are reasonably met. There are no clear deviations from constant variance or clear curvature of the residuals. # We pull the predicted values and residuals from the # R linear model object plot(asthma_mod$fitted, asthma_mod$residuals) 12 13 14 15 16 -0.3 -0.1 0.1 0.2 0.3 asthma_mod$fitted asthma_mod$residuals e. Identify which (particulate, asthma) data point results in the residual with the largest magnitude. Is that point above or below the fitted regression line? Show how the residual is calculated. (Make sure that you can also identify that point on the residual plot.) Going by the vector of residuals calculated by R, the 4th point has the largest residual (-0.3423). This point is below the regression line, because the residual is negative. The observed y is 12.6 which comes from x = 7.9, and the predicted value is 9 . 42 + 0 . 446(7 . 9) = 12 . 943 . asthma_mod$residuals ## 1 2 3 4 5 6 ## -0.09367246 0.08711504 0.07638074 -0.34225706 -0.22813148 0.26903771 ## 7 8 9 10 11 12 ## -0.16316519 -0.03660967 0.10576707 0.29164149 -0.02192362 0.18090719 ## 13 14 15 ## -0.24678350 0.13683481 -0.01514107 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help