Spring 23 Assignment 6 Addl Exs

pdf

School

Purdue University *

*We aren’t endorsed by this school

Course

230

Subject

Business

Date

Jan 9, 2024

Type

pdf

Pages

8

Uploaded by UltraRain12994

Report
Ethan Dolder ASSIGNMENT 6 ADDITIONAL EXERCISES (REQUIRED) 1. (Additional Exercise Required) For a recent season, several variables were recorded for 125 professional golfers . We are interested in using multiple regression to predict Earnings ($) from the predictors indicated in the backward elimination output (with some deletions) below. Briefly explain your answers. Regression Analysis: Earnings ($) versus DrDist, DrAccu, GIR, Sand Saves, Scrambling Backward Elimination of Terms Candidate terms: DrDist, DrAccu, GIR, Sand Saves, Scrambling ------Step 1----- -----Step 2----- -----Step 3----- Coef P Coef P Coef P Constant -11280801 -7264460 -4296110 DrDist 12222 0.448 DrAccu -58196 0.026 -71841 0.000 GIR 107488 0.008 121557 0.001 Sand Saves 48630 0.009 46404 0.022 Scrambling 66373 0.114 58872 0.149 S 936403 934761 939054 R-sq 21.64% _____% ______ R-sq(adj) 18.35% _____% ______ R-sq(pred) 12.93% 13.60% 14.14% Mallows’ Cp 6.00 4.58 4.69 α to remove = 0. 01 a. Using α to remove = 0.01 as indicated, Step 3 produces the final model. Which of the predictor(s) will not be in it? (Remember to briefly explain your answer.) DrDist will first be removed followed by Scrambling. They have the two highest P values that are greater than 0.01. b. Which of the models has the highest multiple R-squared? The highest adjusted R- squared? (Remember to briefly explain your answer.) The third model has the highest multiple R-squared as well as the highest adjusted R-squared. As you remove more predictors, the more accurate the model becomes- resulting in higher R-squared and adjusted R-squared values. CONTINUED BELOW.
2. (Additional Exercise Required) Refer again to the data above. Not satisfied with these predictors, we add a predictor to our dataset called Bounce Back. Below is Best Subsets output using this new dataset. Briefly explain your answers. Best Subsets Regression: Earnings ($) versus DrDist, DrAccu, ... Response is Earnings ($) B S S o a c u n r n d a c D D m e r r S b D A a l B i c G v i a R-Sq R-Sq Mallows s c I e n c Vars R-Sq (adj) (pred) Cp S t u R s g k 1 8.1 7.4 5.0 19.0 997373 X 2 15.4 14.1 11.1 9.9 960722 X X 3 19.9 17.9 14.1 5.1 939054 X X X 4 21.9 19.6 13.6 5.0 930761 X X X X 5 22.4 19.1 13.1 5.2 931759 X X X X X 6 22.6 18.6 12.2 7.0 934739 X X X X X X a. Based on this output, the predictor with the highest correlation in absolute value with Earnings is ___Sand Saves______________. (Remember to briefly explain your answer.) The absolute value of its correlation with Earnings is ___2.846______. b. By the criterion of Best Subsets Regression, the best model for this dataset uses which predictors? (Remember to briefly explain your answer.) The best model for this dataset uses Draccu, GIR, Sand Saves, and Scrambling. When using those 4 predictors the R-Sq(adj) is the highest. c. TRUE or FALSE: The model, Earnings vs. DrDist, Sand Saves, and Bounce Back , will have a multiple R-squared greater than 15%. True d. TRUE or FALSE: The model, Earnings vs. DrAccu, GIR, and Bounce Back , will have an adjusted R-squared less than 18%.
False 3. (Additional Exercise Required) XYZ Co. is studying CEO salaries to determine how to set theirs. For 26 public companies, they record 2021 Sales ($Mil) and CEO Pay ($000). The data is collected in the dataset, CEO Compensation. Open the dataset and use the data to answer the questions below. a. Compute the correlation between Sales and CEO Pay. Excel Formula: =CORREL(A2:A27,B2:B27) = 0.827293 b. Create two new variables, Log Sales and Log CEO Pay by taking the logarithms base 10 of the original variables. Compute the correlation between Log Sales and Log CEO Pay. Excel Formula: =CORREL(H2:H27,I2:I27) Correlation = 0.921456 c. We have learned that the correlation coefficient is unit-free . But the correlations in a and b are not equal. Briefly explain why.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Taking Log10 of Sales and CEO Pay makes the data more accurate which slightly changes how closely correlated the two variables are. d. Fit the regression models, CEO Pay vs Sales and Log CEO Pay vs Log Sales. For both models, in your output, show the following: Regression statistics; Coefficients Table Normal Probability Plot of Residuals Residual Plot. CEO Pay vs Sales Log CEO Pay vs Log Sales What do the diagnostic plots tell you about the two models?
It tells you that the Log CEO Pay vs Log Sales model is much more accurate than the CEO Pay vs Sales model. e. XYZ had 2021 sales of $7.5 billion. They wish to set their CEO Pay based on the one of the regression models above. For each model, calculate the projected CEO Pay based on sales of $5 billion. CEO Pay vs Sales: 852.8587+0.068815(5,000)= 1,196.93=$1,196,930 Log CEO Pay vs Log Sales: 1.656921+0.413971(3.69897)= 3.188= $1,541,700 f. More specifically, XYZ wants their CEO Pay to be within the 95% confidence interval for average CEO Pay for public companies with $7.5 billion in sales. Given this, which of the models should they use to set the CEO Pay. (Consider the residual plots and the prediction output on the following page.) Briefly explain your answer. If XYZ wants their CEO Pay to be within the 95% confidence interval they should use the CEO Pay vs Sales model. Otherwise, the value of 3.26 calculated by plugging 7,500 into the Log CEO Pay vs Log Sales equation would be outside of the 95% confidence interval of (3.11, 3.25).
4. (Additional Exercise Required)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a. Based on this model, for a person with income of $85,000 who viewed the ad, what is the predicted probability that they will buy the cereal? E^-3.623+0.0491(85)+1.605(1)/(1+e^-3.623+0.0491(85)+1.605(1))=0.8962. b. Based on this model, for a person with income of $85,000 who did not view the ad, what is the predicted probability that they will buy the cereal? E^-3.623+0.0491(85)+1.605(0) )/(1+e^-3.623+0.0491(85)+1.605(0))=0.6343. c. To get the odds of buying the cereal for the person in part a, we must multiply the odds for the person in part b by what number? e^1.605 d. Interpret the slopes of the predictors. For every 1 increase in Income, the odds they bought the cereal increase by e^0.0491. If the shopper viewed the ad, the odds they bought the cereal increase by e^1.605.