Unit 10 notes Fall23

docx

School

University Of Georgia *

*We aren’t endorsed by this school

Course

3000

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

7

Report

Uploaded by AdmiralField13659

BUSN 3000 Unit 10: Simple Linear Regression Unit 10: Simple Linear Regression Linear regression A linear regression model describes how a response variable changes as an explanatory variable changes. ^ y = b 0 + b 1 ( x ) In JMP: Analyze – Fit Y by X Interpreting coefficients The slope is the predicted change in Y for a 1-unit increase in X . The intercept is the predicted value of Y when X = 0. Extrapolation occurs when we try to make a prediction based on an X value that is outside the range of our data. These predictions may not be accurate. Making predictions and calculating residuals Predict the price for an 8-year-old car listed on Craigslist. The error between the actual value y and the predicted value ^ y is called the residual : e = y ^ y o Positive residual o Negative residual Calculate the residual for an 8-year-old car that’s listed for $34,995. 1
BUSN 3000 Unit 10: Simple Linear Regression How strong is the association? The correlation ( r ) measures how closely X and Y follow a linear relationship (how close the points are to a straight line). Use the Guess the Correlation applet to try it out. o If the points fall in an almost perfect, negative linear pattern, r is close to _______. o If the points fall in an almost perfect, positive linear pattern, r is close to _______. o If there is almost no linear relationship, r is close to _______. o Is correlation resistant to outliers? The coefficient of determination, R 2 , tells us the proportion of variation in Y that can be predicted (explained) by the model. o If the R 2 value is close to _______, the regression line provides very accurate predictions. o If the R 2 value is close to _______, the regression line is not very useful in making predictions. o Interpret R 2 in context for the used cars example. o Calculate the correlation r for the used cars example. The root mean square error (RMSE) is the average size of the prediction errors (residuals). o Interpret RMSE in context for the used cars example. Consider changes to the dataset What would happen if we changed the units of age from years to months? o The slope would… o The R-sq and correlation values would… o The RMSE value would… Suppose someone listed a 1-year-old car for $0. What would happen if this point were added to the dataset? o The slope would… o The R-sq and correlation values would… o The RMSE value would… 2
BUSN 3000 Unit 10: Simple Linear Regression Inference for Regression The population and the sample Earlier, we built a model to predict a response variable y by fitting a straight line to a sample data set: ^ y = b 0 + b 1 ( x ) We want to draw inferences about the true relationship between x and y in the population . Hypothesis tests Does a linear relationship really exist or is it plausible that the sample slope occurred just by chance? H 0 : H A : Reasoning about strength of evidence: Which graph provides the strongest evidence against H 0 ? The weakest? Test statistics and p-values test stat = sample statistic nullhypothesis value standard error →t = b 1 S E b 1 How large is large enough? o Compare to a t distribution with df = n ¿ of predictors 1 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
BUSN 3000 Unit 10: Simple Linear Regression 4
BUSN 3000 Unit 10: Simple Linear Regression t-test for the population slope An antique dealer recently opened a shop. The store is open to the public on Tuesdays through Sundays from 10am to 6pm. Direct mail (unsolicited flyers and informational brochures delivered via U.S. post) is the primary advertising outlet. Direct mailings are sent out every Wednesday. Use regression to determine if sales are related to the direct mail campaign. Is there a statistically significant relationship between weekly direct mail costs ($) and weekly sales revenue ($)? Confidence intervals for population slope How much do sales change as direct mail costs increase? o Interpret the sample slope. o Calculate the 95% confidence interval for the true slope relating sales to direct mail. samplestat ± ( critical value ) SE CAREFUL: Is it safe to make cause-and-effect conclusions based on this data? 5
BUSN 3000 Unit 10: Simple Linear Regression Residual plots In JMP, fit a linear regression model, then click the red arrow beside Linear Fit and choose Plot Residuals Conditions for inference 1. Linearity – The scatterplot should show a linear pattern. The residual plot should show a random pattern. 2. Constant variance – The residuals should not show a funnel shape; the vertical spread should remain constant across all values of x . 3. Normality – A histogram of residuals should show a Normal shape (unimodal, bell-shaped) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
BUSN 3000 Unit 10: Simple Linear Regression Checking conditions + interpreting RMSE This dataset contains hourly rental data on the Capital Bikeshare program in Washington, D.C. for 500 randomly selected hours spanning a two year period. We're using temperature (°C) to predict the total number of bike rentals. Use the Residual by Predicted plot to check conditions. Do you see any indication that the linearity condition is violated? Do you see any indication that the constant variance condition is violated? Do you see any indication that the normality condition is violated? We will also use regression to assess the relationship between wind speed and the percentage of bad calls for “Reliable Wireless,” a mobile phone company in the Midwest (p-value < 0.0001). Which inference condition(s) can you assess using a dotplot / histogram of residuals? Select all that apply. Linearity Constant variance Normality RMSE = 0.2727. Interpret this value. The RMSE can also be interpreted as the standard deviation of the residuals. Since we're assuming the residuals are Normally distributed, we can use the empirical rule (from Unit 4) to describe the distribution. 95% of the prediction errors (residuals) are smaller than _____________ in absolute value. 7