PROJECT MODULE 3

pdf

School

New York University *

*We aren’t endorsed by this school

Course

103

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

2

Uploaded by BailiffNewtMaster1012

Report
PROJECT, MODULE 3 In the following questions, "response variable" and "predictor variables" refer either to the original variables you collected, or to the logs, according to the decision you made in Module 1, part 5. So the decisions about taking logs (or not) have already been made. Relate your findings to the things you said you hoped to learn in Module 1. 1) First, run simple regressions of your response variable against each of the individual explanatory variables. Interpret the slope coefficients. Determine the p-values for the slopes. (Take account of the direction of the alternative (research) hypothesis). Are the slopes statistically significant? Do the slopes agree with the scatterplots you made in Module 1? 2) Next, run a multiple regression, using all of your predictor variables. Are all of the coefficients significant? Which variables (if any) appear to be useless for predicting the response variable? Check the F-statistic. (Interpret, briefly). How is the R2? Is it appreciably higher than what you got in the simple regressions? 3) Do you find any apparent inconsistencies in the coefficients you get in the full multiple regression model, compared with the coefficients for the corresponding variable in the simple regression? Did the coefficient values change appreciably from the simple model to the full model? Discuss, briefly. 4) For the full multiple regression model, get Cook's D and leverage, and create a plot of Cook’s D vs leverage as we did in class. Also provide a plot of standardized residuals vs. fitted values. Briefly discuss the results. (In multiple regression, the leverage is large if it exceeds 2(k+1)/n, where k is the number of explanatory variables, and Cook's D is large if it exceeds 1). Identify any outliers, and discuss the meaning of the outliers, if possible. Do all of these outliers correspond to the ones found in the scatterplots and descriptive statistics graphs from Module 1? If not, discuss briefly. Overall, considering the R 2 , the significance of the individual coefficients, and the Cook's D values, does the full model seem to fit well? 5) Based on the standardized residuals vs. fitted values plot, is there evidence of nonconstant variance? Based on your results on normality of the response variable from Module 2 and from the normal probability plot of the standardized residuals, together with the evidence of the residual plots and output here, do you think that the Minitab regression model output can be trusted, i.e., are the four assumptions satisfied or violated? 6) Finally, we are going to use an "automatic" method for selecting the "best" predictor variables. For each of the models you have fitted in parts 1 and 2, you will use the
residual sum of squares SSE to compute a number called AICC. The model with the smallest AICC is the "best". AICC is computed as AICC=log(SSE)+2(k+2)/(n− k−3), where "log" is the natural log (that is, "ln" on most calculators) and k is the number of predictor variables in the model. If any of the AICC values are negative, then the most negative value is the "best". Determine which of your possible models is "best" according to AICC. Are all of the coefficients in this model statistically significant? Interpret the coefficients of this "best" model, and say what it means in terms of the things you said you wanted to learn in Module 1, part 1. Please repeat this question by selecting the model with the largest adjusted R-squared value. Do your answers differ?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help