S23 - Assignment #3 - Solutions

.pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

371

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

9

Uploaded by BrigadierAntelopePerson2879

STAT 371 S23 Assignment #3 (Submission deadline: 11:59 pm Fri. Jul. 14th) Solutions ( /70) In this assignment, we will continue with developing a suitable regression model for the CEO data from Assignment #2, beginning with your fitted model used in 2e) of Assignment #2 (i.e. model without Background variate). 1) [5] Plot the residuals vs the fitted values, as well as a QQ plot. Comment on the adequacy of the fitted model, in terms of the model assumptions. We do not appear to have an adequate model. The pattern evident in the plot of the residuals vs the fitted values reveals a misspecification of the functional form and/or non-constant variance. The departure from a straight line relationship in the qq plot is in contradiction to the assumption of normal errors. 2) One approach to stabilize the variance of the residuals and/or more adequately describe the relationship between a response variate and the explanatory variates is with an appropriate transformation of the response variate. a) [3] Create a histogram of CEO compensation. What characteristic of this variate might lead you to suspect that a log transformation may be suitable? The right-skewness in the distribution suggests that a log transformation might help to normalize the response.
b) [2] Refit the data using the (natural) log transformation of compensation. Call: lm(formula = log(COMP) ~ AGE + EDUCATN + TENURE + EXPER + SALES + VAL + PCNTOWN + PROF) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.897e+00 7.211e-01 9.565 7.02e-13 AGE -1.938e-03 1.208e-02 -0.160 0.87324 EDUCATN -3.082e-01 1.160e-01 -2.658 0.01054 TENURE 7.004e-03 6.981e-03 1.003 0.32051 EXPER 1.533e-02 9.554e-03 1.605 0.11489 SALES 2.508e-05 1.636e-05 1.533 0.13151 VAL 1.236e-03 6.158e-04 2.008 0.05011 PCNTOWN -7.308e-02 2.699e-02 -2.708 0.00924 PROF 2.325e-04 3.502e-04 0.664 0.50968 --- Residual standard error: 0.4705 on 50 degrees of freedom Multiple R-squared: 0.4178, Adjusted R-squared: 0.3246 F-statistic: 4.485 on 8 and 50 DF, p-value: 0.0003771 c) [2] Compare the overall fit of the model and significance of the individual parameters with that of the original (untransformed) model. An R-squared value of .4178 indicates that less than 42% of the variation in (log) compensation is accounted for by the variables in the model. This is a slight improvement over the fit of the untransformed model (.4031) PCNTOWN and EDUCATN appear to be the only variable with a significant relationship with compensation, after accounting for the other variables. Note that EXPER has been rendered insignificant by the transformation. d) [4] Replot the two residual plots in 1). Has the transformation helped to address the issues with the adequacy of the (untransformed) model? Yes, the transformation has certainly helped to address the model adequacy issues. The plot of the residuals vs fitted is more randomly scattered. Improvement is also seen in the QQ plot.
3) We can also investigate the suitability of transformations of one or more of the explanatory variates by looking at scatterplots of the variates vs the response (log(COMP), in this case). a) [3] Create a scatterplot of SALES vs log(COMP). Does a linear model seem appropriate for these two variates? No, the relationship between log(COMP) and SALES is not linear. b) [3] Create a scatterplot of log(SALES) vs log(COMP). Comment. The relationship between log(COMP) and log(SALES) appears much more linear (although there appears to be some non-linearity in the relationship for high sales) c) [4] Refit the model once again, this time taking the log transformation of compensation as well as of the variates SALES, VAL, PCNTOWN and PROF. We will use this model going forward. Comment on the effect these transformations have on the overall fit of the model, and on the p-values of the associated variates.
Call: lm(formula = log(COMP) ~ AGE + EDUCATN + TENURE + EXPER + log(SALES) + log(VAL) + log(PCNTOWN) + log(PROF)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.531845 0.897256 6.165 1.21e-07 AGE 0.002864 0.012122 0.236 0.81418 EDUCATN -0.300500 0.114331 -2.628 0.01137 TENURE -0.003343 0.006511 -0.513 0.60993 EXPER 0.015146 0.010056 1.506 0.13830 log(SALES) 0.188393 0.080064 2.353 0.02260 log(VAL) 0.315447 0.096467 3.270 0.00195 log(PCNTOWN) -0.351228 0.105022 -3.344 0.00157 log(PROF) -0.221603 0.104300 -2.125 0.03858 --- Residual standard error: 0.4428 on 50 degrees of freedom Multiple R-squared: 0.4842, Adjusted R-squared: 0.4017 F-statistic: 5.867 on 8 and 50 DF, p-value: 2.732e-05 The transformations of the explanatory variables appears to have improved the fit of the model substantially, as indicated by the increased R-squared value of .4842 (some of you may not experience the same increase, depending on your sample). There are several variables with associated p-values < .05, including education level, and all the log transformed variables (SALES, VAL, PCNTWN, PROF). 4) [4] Plot the residuals vs the fitted values and the QQ plot for the model in 3). Comment on the effect of the transformations on the model assumptions. The transformations have improved model adequacy considerable. The model appears to be well specified with a relatively constant variance (based on the plot of the residuals vs fitted values), and the qq plot suggests that the assumption of normal errors appears to be more reasonably met than with the untransformed variables.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help