STAT3032_004_HW5_S2023_Solution (Shared)

docx

School

University of Minnesota-Twin Cities *

*We aren’t endorsed by this school

Course

3032

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

8

Uploaded by JudgeOxide10008

Report
STAT 3032 Regression and Correlated Data Homework 5 (Solution) Please show your work on each problem for full credit. A correct answer, unsupported by the necessary explanation , R code or output will receive very little if any credit. Your work needs to be organized in a reasonably neat and coherent way, and submitted as a pdf file on Canvas. Please do not share this handout outside the class. Problem 1 [10 pts] The vapor.csv file contains two variables: temp The temperature in degrees Celsius mercury the vapor pressure of mercury measured in mmHg (millimeters of mercury) Interest centers on modeling the vapor pressure (the response variable) as a function of temperature (the predictor variable). (a)_[1 pt] Use R to produce a scatterplot of mercury (y-axis) vs. temp (x-axis). Describe the shape of the points in the scatterplot. Reminder: you should provide the R code of the plot. Solution: > plot(mercury~temp,data = vapor) Examples of the description: -The points form a “J” shape. -As the temperature increases, the pressure is going up at an increasing rate. -The vapor pressure shows exponential growth when the temperature goes up. -The relationship between mercury and temperature is not a line.
STAT 3032 Regression and Correlated Data (b)_[3 pts] Fit the following three models using R. For each model, provide the model equation, the degrees of freedom (of the residuals), and the R 2 . You should provide the relevant R code and output. Model (1): mercury ~ 1 + temp Model (2): mercury ~ 1 + temp + tem p 2 Model (3): mercury ~ 1 + temp + tem p 2 + tem p 3 Solution: Model (1): ^ mercury =− 146.653 + 1.539 ×temp , df = 17, R 2 = 0.5642 Model (2): mercury = ¿ 104.208 2.888 ×temp + 0.0123 ×tem p 2 ¿ , df = 16, R 2 = 0.9073 Model (3): mercury = ¿ 12.69 + 1.629 ×temp 0.01994 ×tem p 2 + 5.969 × 10 5 ×tem p 3 ¿ , df = 15, R 2 = 0.9804 > m1 = lm(mercury ~ 1 + temp, data = vapor) > summary(m1) Call: lm(formula = mercury ~ 1 + temp, data = vapor) Residuals: Min 1Q Median 3Q Max -163.39 -112.01 -26.93 64.44 422.20 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -146.653 69.111 -2.122 0.04883 * temp 1.539 0.328 4.691 0.00021 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 156.6 on 17 degrees of freedom Multiple R-squared: 0.5642, Adjusted R-squared: 0.5386 F-statistic: 22.01 on 1 and 17 DF, p-value: 0.0002101 > m2 = lm(mercury ~ 1 + temp + I(temp^2), data = vapor) > summary(m2)
STAT 3032 Regression and Correlated Data Call: lm(formula = mercury ~ 1 + temp + I(temp^2), data = vapor) Residuals: Min 1Q Median 3Q Max -88.59 -57.88 -20.75 48.91 171.34 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 104.207908 46.296150 2.251 0.038805 * temp -2.888284 0.596208 -4.844 0.000179 *** I(temp^2) 0.012297 0.001598 7.693 9.18e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 74.47 on 16 degrees of freedom Multiple R-squared: 0.9073, Adjusted R-squared: 0.8957 F-statistic: 78.26 on 2 and 16 DF, p-value: 5.475e-09 > m3 = lm(mercury ~ 1 + temp + I(temp^2)+ I(temp^3), data = vapor) > summary(m3) Call: lm(formula = mercury ~ 1 + temp + I(temp^2) + I(temp^3), data = vapor) Residuals: Min 1Q Median 3Q Max -67.825 -16.256 4.465 13.938 54.436 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.269e+01 2.695e+01 -0.471 0.644474 temp 1.629e+00 6.664e-01 2.445 0.027324 * I(temp^2) -1.994e-02 4.372e-03 -4.560 0.000375 *** I(temp^3) 5.969e-05 7.973e-06 7.487 1.93e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 35.34 on 15 degrees of freedom Multiple R-squared: 0.9804, Adjusted R-squared: 0.9765 F-statistic: 250.4 on 3 and 15 DF, p-value: 4.96e-13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT 3032 Regression and Correlated Data Not a part of the homework: some of you may want to see what the fitted models look like in the scatterplot. Here they are: These are the R codes I used to generate the 3rd plot (for your information only and NOT required by this class): # the third plot x_seq = seq(from = 0, to = 360, by = 0.01) mod3 = lm(mercury ~ 1+ temp + I(temp^2) + I(temp^3), data = vapor) fitted_seq = predict(mod3, newdata = data.frame(temp = x_seq)) plot(mercury ~ 1+ temp, data=vapor) lines(fitted_seq ~ x_seq) (c)_[2 pts] Interpret the estimated intercept of the fitted model of mercury ~ 1 + temp + tem p 2 in the context.
STAT 3032 Regression and Correlated Data Solution: When temperature is 0 celsius degrees, the vapor pressure of mercury is 104.208 mmHg on average. (d)_[2 pts] In the fitted model of mercury ~ 1 + temp + tem p 2 + tem p 3 , if we conduct the hypothesis test of H 0 : β 3 = 0 vs. H A : β 3 0 , where β 3 is the slope of tem p 3 . What is the test statistic value based on this sample? What distribution does the test statistic follow under the null hypothesis? What is the p value? Use 0.05 as the significance level, do we have evidence to include tem p 3 in the model? Solution: Based on the coefficient table in the summary of Model (3) in Part (b), the test statistic value is 7.487. Under the null hypothesis it follows a t distribution with 15 degrees of freedom. The p- value is 1.93 × 10 6 , which is less than 0.05, we reject the null hypothesis, thus we have evidence to include tem p 3 in our model. (e)_[2 pts] Do you think a fourth order polynomial term ( tem p 4 ) is necessary? Why or why not? This is an open-ended question. Potential answers: -No, the 4th order polynomial term is not necessary. When we add it to the model, the R- squared value becomes 0.9924, which shows only a small improvement from 0.9801. -No, the 4th order polynomial term is not necessary. Based on the scatterplot with the fitted model of mercury ~ 1 + temp + tem p 2 + tem p 3 , the model with up to the 3rd order polynomial terms was fitting the dataset well. There is no need to make the model more complicated by adding a 4th order polynomial term. -Yes, the 4th order polynomial term is necessary based on the t test of its slope. When testing whether the slope is zero, the p value is 0.000334 (see the model summary). There is strong evidence that the slope of temp^4 is nonzero, which means that we need this term. > m4 = lm(mercury ~ 1 + temp + I(temp^2) + I(temp^3) + I(temp^4), data = vapor) > summary(m4) Call: lm(formula = mercury ~ 1 + temp + I(temp^2) + I(temp^3) + I(temp^4), data = vapor) Residuals: Min 1Q Median 3Q Max -35.726 -16.762 1.896 9.332 48.695
STAT 3032 Regression and Correlated Data Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.067e+01 1.964e+01 1.562 0.140635 temp -1.554e+00 8.003e-01 -1.942 0.072528 . I(temp^2) 2.239e-02 9.412e-03 2.378 0.032170 * I(temp^3) -1.263e-04 3.981e-05 -3.173 0.006773 ** I(temp^4) 2.583e-07 5.483e-08 4.712 0.000334 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 22.75 on 14 degrees of freedom Multiple R-squared: 0.9924, Adjusted R-squared: 0.9903 F-statistic: 458.8 on 4 and 14 DF, p-value: 1.134e-14 Problem 2 [10 pts] The Current Population Survey (CPS) is used to supplement census information between census years. The data file cps1985.csv contains a random sample of 534 persons from the CPS data collected in 1985, with information on wages and other characteristics of the workers. The variables we will use in the analyses are listed below: wage Wage (dollars per hour). educ Number of years of education. exper Number of years of work experience. union Whether the worker has union membership. The possible values are “Yes” and “No”. status The marital status of the worker. The possible values are “Married” and “Single”. Download the cps1985.csv data file from Canvas. Import the dataset into R and answer the following questions. (a)_[1 pt] Fit the model wage ~ union + status in R and provide its model summary . Solution: > mod1= lm(wage ~ union + status, data = cps1985) > summary(mod1) Call: lm(formula = wage ~ union + status, data = cps1985) Residuals:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT 3032 Regression and Correlated Data Min 1Q Median 3Q Max -8.031 -3.568 -1.253 2.010 36.456 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.9757 0.2951 30.414 < 2e-16 *** unionYes 2.0555 0.5729 3.588 0.000364 *** statusSingle -0.9319 0.4629 -2.013 0.044613 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.062 on 531 degrees of freedom Multiple R-squared: 0.03354, Adjusted R-squared: 0.0299 F-statistic: 9.215 on 2 and 531 DF, p-value: 0.0001164 (b)_[1 pt] Based on the model in Part (a), sort the fitted wages of the following four groups from the lowest to the highest. There is no need to explain. Married worker with union membership Married worker without union membership Single worker with union membership Single worker without union membership Solution: From the lowest wage to the highest wage: single workers without union membership < married workers without union membership < single workers with union membership < married workers with union membership Explanation: (Students don’t need to provide explanation) For married workers with union membership , we set statusSingle = 0 and unionYes = 1, the fitted wage is 8.98 + 2.06 x 1 - 0.93 x 0 = 11.04. For married workers without union membership , we set statusSingle = 0 and unionYes = 0, the fitted wage is 8.98 + 2.06 x 0 - 0.93 x 0 = 8.98. For single workers with union membership , we set statusSingle = 1 and unionYes = 1, the fitted wage is 8.98 + 2.06 x 1 - 0.93 x 1 = 10.11. For single workers without union membership , we set statusSingle = 1 and unionYes = 0, the fitted wage is 8.98 + 2.06 x 0 - 0.93 x 1 = 8.05 (c)_[2 pts] Fit the model wage ~ 1 + exper + educ . Provide the summary of the fitted model.
STAT 3032 Regression and Correlated Data Solution: > mod2 = lm(wage ~ 1 + exper + educ, data = cps) > summary(mod2) Call: lm(formula = wage ~ 1 + exper + educ, data = cps) Residuals: Min 1Q Median 3Q Max -8.351 -2.857 -0.599 1.994 36.336 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.9045 1.2189 -4.024 6.56e-05 *** exper 0.1051 0.0172 6.113 1.89e-09 *** educ 0.9260 0.0814 11.375 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.599 on 531 degrees of freedom Multiple R-squared: 0.202, Adjusted R-squared: 0.199 F-statistic: 67.22 on 2 and 531 DF, p-value: < 2.2e-16 (d)_[2 pts] Based on the model in Part (c), interpret the slope of exper in the context. Solution: When the education level of the workers is fixed (When controlling for the education level), increasing the work experience by 1 year is associated with an average increase of $0.1051 in the wage per hour. (e)_[2 pts] Can you interpret the intercept in Part (c) in the context? If yes, please provide the interpretation. If not, please explain. Solution: The intercept means that the workers who have no work experience and no education receive an average wage of -4.90 dollar per hour. However, a negative wage per hour does not make sense.