13-Correlation-Regression

.pdf

School

Seneca College *

*We aren’t endorsed by this school

Course

101

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

8

Uploaded by SuperGazellePerson1097

BSTA 200 3. Correlation and Regression 1 Prof. Joshua Emmanuel Positive y Negative y Negative x Positive x 0 2 4 6 8 10 12 14 16 0 1 2 3 4 5 6 7 8 9 10 Learning Outcomes: Calculate and interpret the simple correlation between two variables Calculate and interpret the simple linear regression equation for a set of data Describe the nature and strength of relationship between 2 interval level variables 3.1 R ELATIONSHIP B ETWEEN T WO Q UANTITATIVE V ARIABLES When we study the relationship between two variables, we refer to the data as bivariate. We are only interested in relationships that can be described with a straight line. SCATTER DIAGRAM To describe the relationship between 2 interval variables graphically, we often use a scatter diagram (or scatter plot). In a scatter diagram (as seen below), the variable along the vertical axis (Y-axis) is the dependent variable while the variable along the horizontal (X-axis) is the independent variable. Example 3.1: Construct a scatter diagram for the following data: x y 1 2 5 8 4 4 2 5 7 12 4 7 3 6 8 12 9 14 Scatter diagrams showing relationships
BSTA 200 3. Correlation and Regression 2 Prof. Joshua Emmanuel 3.2 C ORRELATION The Correlation Coefficient r measures strength of the linear relationship between paired x and y values. It is 'The degree to which the points cluster about the line of best fit' (Howell 1992 p.223). - 1 ≤ r 1 𝑟𝑟 = 𝑛𝑛 ( ∑𝑥𝑥𝑥𝑥 ) − ∑𝑥𝑥∑𝑥𝑥 �𝑛𝑛∑𝑥𝑥 2 ( ∑𝑥𝑥 ) 2 �𝑛𝑛∑𝑥𝑥 2 ( ∑𝑥𝑥 ) 2 The sign of r denotes the direction of association while the magnitude of r denotes the strength of association. -1 -0.6 -0.4 0 0.4 0.6 1 Interchanging x and y does not affect the value of r . Scatter diagrams and linear correlation coefficients CORRELATION DOES NOT IMPLY CAUSATION If the correlation between two variables is strong, it does not mean that one causes the other. People spend more when the weather is cold. Does cold weather increase sales in Canada? 3.3 R EGRESSION In regression analysis we use the independent variable ( X ) to estimate the dependent variable ( Y ). X is also referred to as the explanatory variable and Y is also referred to as the response variable. The relationship between the variables is linear Both variables must be at least interval scale The least squares criterion is used to determine the equation strong negative moderate negative weak negative weak positive moderate positive strong positive
BSTA 200 3. Correlation and Regression 3 Prof. Joshua Emmanuel LEAST SQUARES PRINCIPLE : The regression equation is obtained by minimizing the sum of the squares of the vertical distance between the actual y values and the predicted values of y. The scatter diagram on the right (with regression line) shows the relationship between wait time (before seeing a doctor) and satisfaction rating at a hospital. Regression Equation : An equation that expresses the linear relationship between two variables . General Form of Linear Regression Equation: 𝒚𝒚 = 𝒃𝒃 𝟎𝟎 + 𝒃𝒃 𝟏𝟏 𝒙𝒙 or 𝒚𝒚 = 𝒂𝒂 + 𝒃𝒃𝒙𝒙 𝒚𝒚 = 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑 𝒗𝒗𝒂𝒂𝒗𝒗𝒗𝒗𝒑𝒑 𝒐𝒐𝒐𝒐 𝒑𝒑𝒕𝒕𝒑𝒑 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒅𝒅𝒑𝒑𝒑𝒑𝒅𝒅𝒑𝒑 𝒗𝒗𝒂𝒂𝒑𝒑𝒑𝒑𝒂𝒂𝒃𝒃𝒗𝒗𝒑𝒑 𝒃𝒃 𝒐𝒐𝒑𝒑 𝒃𝒃 𝟏𝟏 is the slope 𝒂𝒂 𝒐𝒐𝒑𝒑 𝒃𝒃 𝟎𝟎 is the y-intercept 𝒃𝒃 = 𝑛𝑛 ( ∑𝑥𝑥𝑥𝑥 ) − ∑𝑥𝑥∑𝑥𝑥 𝑛𝑛∑𝑥𝑥 2 ( ∑𝑥𝑥 ) 2 𝒂𝒂 = ∑𝑥𝑥 𝑛𝑛 − 𝑏𝑏 ∑𝑥𝑥 𝑛𝑛 Slope 𝒃𝒃 𝒐𝒐𝒑𝒑 𝒃𝒃 𝟏𝟏 : the expected change in the value of y for a unit increase in x . The slope is interpreted as the change in the Y variable associated with a unit change in the X variable. y-intercept, 𝒂𝒂 𝒐𝒐𝒑𝒑 𝒃𝒃 𝟎𝟎 : the point where the regression line crosses the Y axis. The Y intercept is the predicted value of Y for an X value of zero. 0 20 40 60 80 100 0 10 20 30 40 50 60 70 SATISTIFACTION RATING WAIT TIME SATSIFACTION RATING
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help