Competed unit 5

pdf

School

Yorkville University *

*We aren’t endorsed by this school

Course

2000

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

189

Uploaded by CoachSnakeMaster1434

Report
STAT 2000 – Unit 5 Carrie Madden Carrie Madden STAT 2000 – Unit 5 1 / 184
Carrie Madden STAT 2000 – Unit 5 2 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Section 1 Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Carrie Madden STAT 2000 – Unit 5 3 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Review of Inference for a Single Population Proportion P Now suppose we are interested in the proportion ˆ p of successes: ˆ p = X n The mean and standard deviation of ˆ p are µ ˆ p = p and σ ˆ p = ttttttt p ( 1 p ) n Carrie Madden STAT 2000 – Unit 5 4 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Distribution of a Sample Proportion The Central Limit Theorem says that, if a random variable X represents a sample mean, and if the sample size is high, then the sampling distribution of X is approximately normal. Specifically, Z = X µ X σ X N ( 0 , 1 ) But we can think of ˆ p as a kind of sample mean, because we are adding up all the successes we observe and dividing by the sample size n . Carrie Madden STAT 2000 – Unit 5 5 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Distribution of a Sample Proportion Result
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Example
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example We calculate np = 200 ( 0 . 10 ) = 20 > 10 and n ( 1 p ) = 200 ( 0 . 90 ) = 180 > 10 Since the population is large, we can use the normal distribution. The probability that at least 12.5% of the sampled students are left-handed is: P p 0 . 125 ) = P 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA Z ˆ p p ttttttt p ( 1 p ) n 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB = P 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA Z 0 . 125 0 . 10 ttttttt 0 . 1 ( 0 . 9 ) 200 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB = P ( Z 1 . 18 ) = 1 P ( Z < 1 . 18 ) = 1 0 . 8810 = 0 . 1190 Carrie Madden STAT 2000 – Unit 5 8 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – R code Just as a refresher to find P(Z > 1.18) prob <- 1 -pnorm( 1.18 ) prob ## [1] 0.1190001 Carrie Madden STAT 2000 – Unit 5 9 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Airlines Example
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – R code P ( Z > 0 . 82 ) prob <- 1 -pnorm(- 0.82 ) prob ## [1] 0.7938919 Carrie Madden STAT 2000 – Unit 5 12 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question It is known that 20% of a certain type of lottery ticket are winners. If you buy 100 lottery tickets, what is the approximate probability that at least 25 of them are winners? A 0.1056 B 0.1539 C 0.2061 D 0.2578 E 0.3085 Carrie Madden STAT 2000 – Unit 5 13 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for a Population Proportion We now examine inference methods for the case where the parameter of interest is some population proportion p . We estimate p by the sample proportion ˆ p , which has mean and standard deviation µ ˆ p = p and σ ˆ p = ttttttt p ( 1 p ) n Carrie Madden STAT 2000 – Unit 5 14 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for a Population Proportion Recall that, in order to use the normal distribution when doing probability calculations for ˆ p , we required that np 10 and n ( 1 p ) 10, and that the sample size was small compared to the population size. Although we won’t formally verify this for each example in this unit, these assumptions will hold for all of them, and so the use of the normal distribution in doing probability calculations is justified. Carrie Madden STAT 2000 – Unit 5 15 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Confidence Interval for a Population Proportion} We take a simple random sample of n individuals and calculate the proportion ˆ p that possess some characteristic of interest. A ( 1 α ) 100 % confidence interval for a population proportion p is given as ˆ p ± z ttttttt ˆ p ( 1 ˆ p ) n . Ideally, we would like to use the true standard deviation of p in the formula, but we don’t know p (this is the reason for doing inference!), so we must estimate it by ˆ p , and we estimate the standard deviation by the standard error of ˆ p . Carrie Madden STAT 2000 – Unit 5 16 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Obama Example
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question We would like to construct a 95% confidence interval for the proportion of people who take vitamins regularly. A random sample of 900 individuals has been selected from a large population. It was found that 180 take vitamins regularly. The standard error of this estimate is: A 0.1600 B 0.0002 C 0.0261 D 0.0133 E 0.0298 Carrie Madden STAT 2000 – Unit 5 18 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Sample Size Determination Suppose we would like to select a sample of individuals large enough to estimate some population proportion p to within a specified margin of error m with a given level of confidence. m = z ttttttt ˆ p ( 1 ˆ p ) n n = z m 2 ˆ p ( 1 ˆ p ) Carrie Madden STAT 2000 – Unit 5 19 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Sample Size Determination But we have a problem – we are at the stage where we have not yet selected the sample, and so we don’t know the value of ˆ p . We will estimate the value of p by some value p . We can either use an educated guess for p , or we can use a conservative estimate p = 0 . 5, which will result in a margin of error no greater than m , regardless of the sample proportion ˆ p . Carrie Madden STAT 2000 – Unit 5 20 / 184
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example We therefore use the following formula to determine the required sample size: n = z m 2 p ( 1 p ) If we believe the value of p is relatively close to 0.5 (say, between 0.3 and 0.7), we should use p = 0 . 5. Otherwise, we use some educated guess. Carrie Madden STAT 2000 – Unit 5 21 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Using the conservative estimate p = 0 . 5 does not result in a much higher sample size than if we had used p = 0 . 3 or p = 0 . 7, for which the required sample size would be n = 897. However, if we suspect the sample proportion will be quite far from 0.5, we may want to use an educated guess for p . Carrie Madden STAT 2000 – Unit 5 23 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example We see that we would be taking much too large a sample if we used p = 0 . 5. This would give us a confidence interval with a margin of error much smaller than what we originally wanted. A small margin of error is good, by we decided we were happy estimating the true proportion to within 2%, and we see that if we use a more reasonable value of p (i.e., 0.10), we need to sample 1083 fewer individuals. Carrie Madden STAT 2000 – Unit 5 26 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question We would like to estimate the true proportion of people who use cell phones while driving to within 0.05 with 95% confidence. What sample size is required? A 49 B 385 C 91 D 196 E 246 Carrie Madden STAT 2000 – Unit 5 27 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question A researcher calculates that, in order to estimate the true proportion of Canadian adults who smoke cigarettes to within 0.03 with 90% confidence, she requires a sample of 360 Canadians. What sample size would be required in order to estimate the true proportion of Canadian adults who smoke cigarettes to within 0.01 with 90% confidence? A 40 B 120 C 360 D 1,080 E 3,240 Carrie Madden STAT 2000 – Unit 5 28 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Hypothesis Tests for a Population Proportion We can also conduct hypothesis tests for a population proportion p . Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Suppose we had instead use the critical value method to conduct the test. Our decision rule would be to reject H 0 if Z z = 1 . 645. We would reject H 0 since Z = 2 . 33 > z = 1 . 645. Our conclusion would be that we have sufficient evidence to conclude that more than 50% of Canadians support the long gun registry. Carrie Madden STAT 2000 – Unit 5 33 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Gun Registry R Code prop.test( 459 , 850 , p = 0.50 , correct = FALSE) ## ## 1-sample proportions test without continuity correction ## ## data: 459 out of 850, null probability 0.5 ## X-squared = 5.44, df = 1, p-value = 0.01968 ## alternative hypothesis: true p is not equal to 0.5 ## 95 percent confidence interval: ## 0.5063896 0.5732504 ## sample estimates: ## p ## 0.54 Carrie Madden STAT 2000 – Unit 5 34 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question Shortly after the introduction of the Euro coin in Belgium, newspapers around the world published articles claiming the coin was biased. A hypothesis test is to be conducted to determined if the Euro coin really is unfair. Let p denote the true proportion of all flips of the coin that would result in heads (and so ˆ p is the proportion of heads in the sample). The hypotheses for the appropriate test of significance are: A H 0 : p = ˆ p vs. H a : p ̸ = ˆ p B H 0 : p = 0 . 5 vs. H a : p > 0 . 5 C H 0 : ˆ p = 0 . 5 vs. H a : ˆ p > 0 . 5 D H 0 : p = 0 . 5 vs. H a : p ̸ = 0 . 5 E H 0 : ˆ p = 0 . 5 vs. H a : ˆ p ̸ = 0 . 5 Carrie Madden STAT 2000 – Unit 5 35 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Solution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Suppose we had instead used the critical value method to conduct the test. Our decision rule would be to reject H 0 if | Z | ≥ z = 1 . 96 (i.e., if Z ≤ − 1 . 96 or Z 1 . 96), where z = 1 . 96 is the upper 0.025 critical value for the standard normal distribution. We would fail to reject H 0 since 1 . 96 < z = 1 . 16 < 1 . 96. Carrie Madden STAT 2000 – Unit 5 40 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question A drug company manufactures antacid that is known to be successful in providing relief for 70% of people with heartburn.The company tests a new formula to determine whether it is better than the current one.A total of 150 people with heartburn try the new antacid and 120 report feeling some relief. The appropriate test statistic for testing whether the new formula is more effective than the old formula is: A Z = 0 . 8 0 . 7 ttttttt 0 . 8 ( 0 . 2 ) 150 B Z = 0 . 7 0 . 8 ttttttt 0 . 7 ( 0 . 3 ) 150 C Z = 0 . 8 0 . 7 ttttttt 0 . 7 ( 0 . 3 ) 150 D Z = 0 . 8 0 . 7 ttttttt Carrie Madden STAT 2000 – Unit 5 41 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Power Calculations We can also calculate the power of hypothesis tests for proportions. For example, we conducted a test of H 0 : p = 0 . 50 vs. H a : p > 0 . 50 for the proportion of Canadians who support the long gun registry at the 5 % level of significance. What would be the power of the test if the true proportion of Canadians who supported the registry was 0.55? Power = P(reject H 0 | p = 0 . 55) Carrie Madden STAT 2000 – Unit 5 42 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Step 1 : Find the rejection rule in terms of ˆ p , assuming H 0 is true: Reject H 0 if Z 1 . 645 Z = ˆ p p ttttttt p ( 1 p ) n ˆ p 0 . 50 + 1 . 645 ttttttt 0 . 5 ( 0 . 5 ) 850 ˆ p 0 . 5282 Carrie Madden STAT 2000 – Unit 5 43 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Step 2 : Find the probability of rejecting H 0 assuming H a is true. Power = P 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA Z ˆ p p ttttttt p ( 1 p ) n 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB P 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA Z 0 . 5282 0 . 55 ttttttt 0 . 55 ( 0 . 45 ) 850 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB P ( Z ≥ − 1 . 28 ) = 1 P ( Z < 1 . 28 ) = 1 0 . 1003 = 0 . 8997 Carrie Madden STAT 2000 – Unit 5 44 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example A health organization claims that less than one quarter of all adults smoke cigarettes. We would like to conduct a hypothesis test, at the 10% level of significance, to verify this claim. We will select a simple random sample of 250 adults and ask them whether they smoke. What is the power of the test if the true proportion of adults who smoke is 0.20? We want the power of the test if H 0 : p = 0 . 25 vs. H a : p < 0 . 25 when p = 0 . 20. Carrie Madden STAT 2000 – Unit 5 45 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Step 1 : Find the rejection rule in terms of ˆ p , assuming H 0 is true: Reject H 0 if Z ≤ − 1 . 282 Z = ˆ p p ttttttt p ( 1 p ) n ˆ p 0 . 25 1 . 282 ttttttt 0 . 25 ( 0 . 75 ) 250 ˆ p 0 . 2149 Carrie Madden STAT 2000 – Unit 5 46 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Step 2 : Find the probability of rejecting H 0 assuming H a is true. Power = P 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA Z ˆ p p ttttttt p ( 1 p ) n 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB P 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA Z 0 . 2149 0 . 20 ttttttt 0 . 20 ( 0 . 80 ) 250 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB P ( Z 0 . 59 ) = 0 . 7224 Carrie Madden STAT 2000 – Unit 5 47 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference Comparing Two Proportions We will now turn our attention to the case where we wish to compare two population proportions . Let p 1 be the true proportion of all individuals in Population 1 who possess some attribute. Let p 2 be the true proportion of all individuals in Population 2 who possess the same attribute. Carrie Madden STAT 2000 – Unit 5 48 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference Comparing Two Proportions We would like to estimate the difference in population proportions p 1 p 2 . To do this, we will take an SRS of size n 1 from Population 1 and an SRS of size n 2 from Population 2. We will calculate ˆ p 1 and ˆ p 2 , the sample proportions from the first and second samples, respectively. Carrie Madden STAT 2000 – Unit 5 49 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference Comparing Two Proportions Our estimate of p 1 p 2 is ˆ p 1 ˆ p 2 . When estimating p 1 p 2 , we obviously won’t know the standard deviation of ˆ p 1 ˆ p 2 , which is a function of the population proportions p 1 and p 2 . We will estimate the standard deviation by the standard error of ˆ p 1 ˆ p 2 : SE ( ˆ p 1 ˆ p 2 ) = ttttttt ˆ p 1 ( 1 ˆ p 1 ) n 1 + ˆ p 2 ( 1 ˆ p 2 ) n 2 Carrie Madden STAT 2000 – Unit 5 50 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Confidence Intervals
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Let p 1 be the true population proportion of all young adults who support the legalization of marijuana and let p 2 be the true population proportion of all older adults who support it. We calculate the sample proportions: ˆ p 1 = 87 150 = 0 . 58 ˆ p 2 = 54 120 = 0 . 45 We check n 1 ˆ p 1 = 150 ( 0 . 58 ) = 87 > 10 n 1 ( 1 ˆ p 1 ) = 150 ( 0 . 42 ) = 63 > 10 n 2 ˆ p 2 = 120 ( 0 . 45 ) = 54 > 10 n 2 ( 1 ˆ p 2 ) = 120 ( 0 . 55 ) = 66 > 10 so the use of the normal approximation is justified. Carrie Madden STAT 2000 – Unit 5 53 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Let us calculate a 95% confidence interval for the difference in the population proportions p 1 p 2 To calculate the 95% confidence interval, we need to first find the standard error of ˆ p 1 ˆ p 2 : ttttttt ˆ p 1 ( 1 ˆ p 1 ) n 1 + ˆ p 2 ( 1 ˆ p 2 ) n 2 ttttttt 0 . 58 ( 1 0 . 58 ) 150 + 0 . 45 ( 1 0 . 45 ) 120 = 0 . 0607 Therefore, a 95% confidence interval is: 0 . 58 0 . 45 ± ( 1 . 96 )( 0 . 0607 ) ( 0 . 110 , 0 . 2490 ) Carrie Madden STAT 2000 – Unit 5 54 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Interpretation of the confidence interval : If we took repeated samples of the same sizes from the same populations and calculated the interval in a similar manner, then 95% of all such intervals would contain the true difference in population proportions of young and older adults who support the legalization of marijuana in Canada.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question An SRS of 100 flights of a large airline (Airline 1) showed that 79 were on time. An SRS of 150 flights of another airline (Airline 2) showed that 96 were on time. Let p 1 and p 2 be the proportions of all flights that are on time for these two airlines, respectively. A 90% confidence interval for p 1 p 2 is: A 0 . 15 ± 1 . 645 ( 0 . 068 ) B 0 . 15 ± 1 . 645 ( 0 . 079 ) C 0 . 15 ± 1 . 645 ( 0 . 057 ) D 0 . 15 ± 1 . 645 ( 0 . 084 ) E 0 . 15 ± 1 . 645 ( 0 . 093 ) Carrie Madden STAT 2000 – Unit 5 56 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question A university administrator would like to estimate the true proportion of students at the university with student loans. She takes a random sample of 137 students at the university and calculates a 95% confidence interval to be ( 0 . 46 , 0 . 59 ) . What is the correct interpretation of this confidence interval? A 95% of samples of 137 students will give proportions between 0.46 and 0.59. B 95% of similarly constructed intervals will contain the sample proportion of students at the university with loans. C 95% of similarly constructed intervals will contain the true proportion of students at the university with loans. D The probability that the true proportion is between 0.46 and 0.59 is 95%. E Between 46% and 59% of students have loans. Carrie Madden STAT 2000 – Unit 5 57 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Hypothesis Testing Recall; For large sample sizes, Z = ( ˆ p 1 ˆ p 2 ) ( p 1 p 2 ) ttttttt ˆ p 1 ( 1 ˆ p 1 ) n 1 + ˆ p 2 ( 1 ˆ p 2 ) n 2 However, if the null hypothesis is true, p 1 = p 2 , and so p 1 p 2 = 0, and so under H 0 , Z = ( ˆ p 1 ˆ p 2 ) 0 ttttttt ˆ p 1 ( 1 ˆ p 1 ) n 1 + ˆ p 2 ( 1 ˆ p 2 ) n 2 Carrie Madden STAT 2000 – Unit 5 58 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Hypothesis Testing But if p 1 = p 2 , then the proportions in the denominator are really the same proportion , say p . We will estimate this common proportion p ( = p 1 = p 2 ) by the pooled sample proportion ˆ p : ˆ p c = total successes in both samples total observations in both samples = x 1 + x 2 n 1 + n 2 Carrie Madden STAT 2000 – Unit 5 59 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Hypothesis Testing The appropriate test statistic for this test of significance is therefore Z = ( ˆ p 1 ˆ p 2 ) ttttttt ˆ p c ( 1 ˆ p c ) 1 n 1 + 1 n 2 The use of the normal distribution as an approximation is only appropriate if n 1 ˆ p 1 , n 1 ( 1 ˆ p 1 ) , n 2 ˆ p 2 and n 2 ( 1 ˆ p 2 ) are all greater than or equal to ten (i.e., if there are at least ten successes and ten failure in each of the two samples.) Carrie Madden STAT 2000 – Unit 5 60 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Hypothesis Test Is the true proportion of young adults who support the legalization of marijuana in Canada greater than that for older adults? 1 Let α = 0 . 05. 2 We are testing the hypotheses H 0 : p 1 = p 2 vs. H a : p 1 > p 2 3 We will reject the null hypothesis if the p -value α = 0 . 05. Carrie Madden STAT 2000 – Unit 5 61 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Hypothesis Test We calculate the pooled sample proportion ˆ p c = x 1 + x 2 n 1 + n 2 = 87 + 54 150 + 120 = 141 270 = 0 . 5222 4 The test statistic is Z = ( ˆ p 1 ˆ p 2 ) ttttttt ˆ p c ( 1 ˆ p c ) 1 n 1 + 1 n 2 = ( 0 . 58 0 . 45 ) ttttttt 0 . 5222 ( 1 0 . 5222 ) 1 150 + 1 120 = 2 . 12 Carrie Madden STAT 2000 – Unit 5 62 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Hypothesis Test 5 The p -value is P ( Z 2 . 12 ) = 1 P ( Z < 2 . 12 ) = 0 . 0170 Since the p -value=0.0170 < α = 0 . 05, we reject H 0 . 6 There is sufficient evidence that the true proportion of young adults who support the legalization of Marijuana in Canada is greater than that for older adults. Interpretaion of the P-value : if the true proportions of young and older adults who support the legalization of marijuana in Canada were equal, the probability of observing a difference in sample proportions at least as 0.13 would be 0.0170 Carrie Madden STAT 2000 – Unit 5 63 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Hypothesis Test Suppose we had instead used the critical value method to conduct the test. Our decision would be to reject H 0 if Z z = 1 . 645. We would still reject H 0 , since z = 2 . 12 > z = 1 . 645. Carrie Madden STAT 2000 – Unit 5 64 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Hypothesis Test – R Code res <- prop.test( x = c( 87 , 54 ), n = c( 150 , 120 ), alternative = # Printing the results res Carrie Madden STAT 2000 – Unit 5 65 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Legalization Hypothesis Test – R Code ## ## 2-sample test for equality of proportions without continui ## ## data: c(87, 54) out of c(150, 120) ## X-squared = 4.5156, df = 1, p-value = 0.01679 ## alternative hypothesis: greater ## 95 percent confidence interval: ## 0.03013015 1.00000000 ## sample estimates: ## prop 1 prop 2 ## 0.58 0.45 Carrie Madden STAT 2000 – Unit 5 66 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking We can also use the two-sample test for proportions to compare two treatments in an experiment. Example
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (CI) Let p 1 be the true proportion of all smokers who chew nicotine gum that are able to quit smoking, and let p 2 be the true proportion of all smokers who wear a nicotine patch that are able to quit smoking. We calculate the sample proportions: ˆ p 1 = 22 63 = 0 . 3492 ˆ p 2 = 17 62 = 0 . 2742 We check n 1 ˆ p 1 = 63 ( 0 . 3492 ) = 22 > 10 n 2 ( 1 ˆ p 1 ) = 63 ( 0 . 6508 ) = 41 > 10 n 2 ˆ p 2 = 62 ( 0 . 2742 ) = 17 > 10 n 2 ( 1 ˆ p 2 ) = 62 ( 0 . 7258 ) = 45 > 10 so the use of the normal approximation is justified. Carrie Madden STAT 2000 – Unit 5 68 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (CI) We would like to construct a 95 % confidence interval for the true difference in the proportions of smokers who are able to quit chewing nicotine gum and wearing a nicotine patch. The standard error of ˆ p 1 ˆ p 2 is: SE = ttttttt ˆ p 1 ( 1 ˆ p 1 ) n 1 + ˆ p 2 ( 1 ˆ p 2 ) n 2 SE = ttttttt 0 . 3492 ( 1 0 . 3492 ) 63 + 0 . 2742 ( 1 0 . 2742 ) 62 = 0 . 0826 Carrie Madden STAT 2000 – Unit 5 69 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (CI) Therefore, a 95 % confidence interval for p 1 p 2 is: ( 0 . 3492 0 . 2742 ) ± 1 . 96 ( 0 . 0826 ) = ( 0 . 0869 , 0 . 2369 ) Carrie Madden STAT 2000 – Unit 5 70 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (Hypothesis Test) We will now conduct a hypothesis test to determine whether there is a difference in effectiveness for nicotine gum and the nicotine patch. 1 Let α = 0 . 05. 2 We are testing the hypotheses: H 0 : p 1 = p 2 vs. H a : p 1 ̸ = p 2 3 We will reject the null hypothesis if the p -value α = 0 . 05. Carrie Madden STAT 2000 – Unit 5 71 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (Hypothesis Test) We calculate the pooled sample proportion ˆ p = x 1 + x 2 n 1 + n 2 = 22 + 17 63 + 62 = 39 125 = 0 . 312 4 The test statistic is: Z = ( ˆ p 1 ˆ p 2 ) ttttttt ˆ p c ( 1 ˆ p c ) 1 n 1 + 1 n 2 = ( 0 . 3492 0 . 2742 ) ttttttt 0 . 312 ( 1 0 . 312 ) 1 63 + 1 62 = 0 . 90 Carrie Madden STAT 2000 – Unit 5 72 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (Hypothesis Test) 5 The p -value is 2 PZ 0 . 90 = 2 ( 1 P ( Z < 0 . 90 ) = 2 ( 0 . 1841 ) = 0 . 3682 Since the p value = 0.3682> α = 0 . 05, we fail to reject H 0 . 6 There is insufficient evidence that there is a difference in effectiveness between nicotine gum and the nicotine patch. Carrie Madden STAT 2000 – Unit 5 73 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (Hypothesis Test) Suppose we had instead used the critical value method to conduct the test. Our decision rule would be to reject H 0 if | Z | ≥ z = 1 . 96 (i.e., if z ≤ − 1 . 96 or z 1 . 96), where z = 1 . 96 is the upper 0.025 critical value from the standard normal distribution. We would fail to reject H 0 , since 1 . 96 < z = 0 . 090 < z = 1 . 96. Carrie Madden STAT 2000 – Unit 5 74 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking (Hypothesis Test) ## ## 2-sample test for equality of proportions without continui ## ## data: c(22, 17) out of c(63, 62) ## X-squared = 0.81912, df = 1, p-value = 0.3654 ## alternative hypothesis: two.sided ## 95 percent confidence interval: ## -0.08681405 0.23683966 ## sample estimates: ## prop 1 prop 2 ## 0.3492063 0.2741935 Carrie Madden STAT 2000 – Unit 5 75 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question An SRS of 200 of a certain model of 2010 car found that 50 had minor brake defects. An SRS of 100 of the same model of 2011 car found that 10 had minor brake defects. Let p 1 and p 2 be the true proportions of all 2010 and 2011 cars with brake defects. We wish to conduct a hypothesis test of H 0 : p 1 = p 2 vs. H a : p 1 ̸ = p 2 at the 5% level of significance. The value of the appropriate test statistic is: A 3.50 B 2.74 C 4.03 D 3.06 E 3.64 Carrie Madden STAT 2000 – Unit 5 76 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Questions An SRS of 200 of a certain model of 2010 car found that 50 had minor brake defects. An SRS of 100 of the same model of 2011 car found that 10 had minor brake defects. Let p 1 and p 2 be the true proportions of all 2010 and 2011 cars with brake defects. We wish to conduct a hypothesis test of H 0 : p 1 = p 2 vs. H a : p 1 ̸ = p 2 at the 5% level of significance. The test statistic is calculated to be 3.06. What is the p -value of the test? A 0.0006 B 0.9989 C 0.0022 D 0.0011 E 0.9978 Carrie Madden STAT 2000 – Unit 5 77 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question We wish to conduct a test of significance to determine whether there is evidence that the true proportion of females who smoke cigarettes is greater than that for males. We would make a Type I Error if we conclude that: A p f > p m when in fact p m > p f . B p f = p m when in fact p f > p m . C p f ̸ = p m when in fact p f > p m . D p m > p f when in fact p f > p m . E p f > p m when in fact p f = p m . ##Contingency Tables Consider the following contingency table classifying each individual in a sample by both eye colour and hair colour: Eye Colour Blonde Red Brown Black Grey Brown 11 8 39 14 6 Blue 15 7 16 3 10 Green 9 4 12 2 5 Carrie Madden STAT 2000 – Unit 5 78 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Contingency Tables in R That is the contingency table followed by the row totals and column totals ## Blonde Red Brown Black Grey ## Brown 11 8 39 14 6 ## Blue 15 7 16 3 10 ## Green 9 4 12 2 5 ## Brown Blue Green ## 78 51 32 ## Blonde Red Brown Black Grey ## 35 19 67 19 21 Carrie Madden STAT 2000 – Unit 5 79 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Contingency Tables in R That is the contingency table followed by the row totals and column totals colour <- matrix(c( 11 , 8 , 39 , 14 , 6 , 15 , 7 , 16 , 3 , 10 , 9 , 4 , 12 , 2 , 5 ), nrow = 3 , ncol = 5 , byrow = TRUE ) dimnames(colour) = list(c( "Brown" , "Blue" , "Green" ), c( "Blonde" , "Red" , "Brown" , "Black" , "Gre colour2 <- data.frame(colour) colour2 rowSums(colour2) colSums(colour2) Carrie Madden STAT 2000 – Unit 5 80 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Contingency Table The entries in the table are referred to as observed cell frequencies and are denoted by O . The previous two-way table is called a 3 × 5 table, since there are three rows and five columns (i.e., three possible eye colours and five possible hair colours). In general, a two-way table with r rows and c columns is called an r × c table. Contingency tables are very useful in helping us conduct several different tests of significance. The first use we will make of these tables is to conduct tests of significance for the homogeneity of several populations with respect to some variable of interest. Carrie Madden STAT 2000 – Unit 5 81 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department The Ecology department head is examining the course evaluations for four different sections of an introductory course taught last semester by four different instructors. He would like to know if the opinions of students are homogenous with respect to the quality of instruction they received from their respective professors. One question on the course evaluation reads: “Overall, I would say this professor is. . . ” Students indicate whether they found their professor to be Very Good, Good, Average, Poor, or Very poor. The department head regroups these ratings into three categories: Positive (i.e., Very Good or Good), Neutral (i.e., Average) and Negative (i.e., Poor and Very Poor). Carrie Madden STAT 2000 – Unit 5 82 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department The results for a sample of students in each of the four classes are shown in the two-way table below: Section Rating A01 A02 A03 A04 Positive 22 16 25 10 Neutral 14 21 13 14 Negative 4 10 7 19 Carrie Madden STAT 2000 – Unit 5 83 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogeneous Populations How can these data be analyzed to determine if the opinions of the students in each of the classes are homogeneous with respect to the quality of teaching they received. That is, we want to test the hypotheses H 0 : Opinions of students are homogeneous with respect to the quality of teaching they received. H a : Opinions of students are not homogeneous with respect to the quality of teaching they received. Carrie Madden STAT 2000 – Unit 5 84 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogenous Populations In other words, we are testing whether the proportion of positive, neutral and negative ratings are the same for all four professors. In conducting a test of significance, we must always ask, “If the null hypothesis were true, what would we expect?” In other words, what would we expect to see if students’ opinions really were homogenous for all four classes? Carrie Madden STAT 2000 – Unit 5 85 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogeneous Populations If all students are equally satisfied, then: the same proportion should rate their professor as positive for each section, the same proportion should rate their professor as neutral for each section, and the same proportion should rate their professor as negative for each section. Carrie Madden STAT 2000 – Unit 5 86 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogenous Populations How do we estimate this common proportion for each of the three rating categories? To help us answer this question, we examine the table again, this time with row, column and table totals included: Section Rating A01 A02 A03 A04 Row Total Positive 22 16 25 10 73 Neutral 14 21 13 14 62 Negative 4 10 7 19 40 Column Total 40 47 45 43 175 Carrie Madden STAT 2000 – Unit 5 87 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogenous Populations The column totals represent the sample sizes from each of the four classes and the row totals represent the total number of positive, neutral and negative ratings given by all students in the sample. Note that we can obtain the table total (which in this case is 175) by adding the row totals or the column totals. Section Rating A01 A02 A03 A04 Row Total Positive 22 16 25 10 73 Neutral 14 21 13 14 62 Negative 4 10 7 19 40 Column Total 40 47 45 43 175 Carrie Madden STAT 2000 – Unit 5 88 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogenous Populations Let us examine the row for Positive ratings. The estimated proportion ˆ p for all students who rate their professor as positive is total number of positive ratings total number of students in the sample = row 1 total table total = 73 175 = 0 . 4171 As such, if opinions are homogenous for all four classes, we would expect to see 41.71% of students in each class give a positive rating. Carrie Madden STAT 2000 – Unit 5 89 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Expected Cell Counts For example, the expected count of A02 students who gave a positive rating is E = (ˆ p ) ( # of responses for A02 ) = ( row 1 total ) ( column 2 total ) table total = ( 73 )( 47 ) 175 = 19 . 61 By a similar argument, the expected count for the cell at the intersection of the r th row and the c th column is E = ( row r total ) ( column c total ) table total Carrie Madden STAT 2000 – Unit 5 90 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department The expected counts for all cells are calculated similarly and are displayed in the following table below the observed counts (and it parentheses): Section Rating A01 A02 A03 A04 Row Total Positive 22 16 25 10 73 (16.69) (19.61) (18.77) (17.94) Neutral 14 21 13 14 62 (14.17) (16.65) (15.94) (15.23) Negative 4 10 7 19 40 (9.14) (10.74) (10.29) (9.83) Column Total 40 47 45 43 175 Carrie Madden STAT 2000 – Unit 5 91 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department The test statistic we will use to test for homogeneity for these four populations measures how far off our observed counts are from our expected counted. The test statistic is χ 2 = YYYYYYY all cells ( O E ) 2 E Under the null hypothesis of homogenous populations, this test statistic follows a chi-square distribution with ( r 1 )( c 1 ) degrees of freedom . Carrie Madden STAT 2000 – Unit 5 92 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Distributions The chi-square distributions are a family of right-skewed distributions completely characterized by their degrees of freedom (i.e., the degrees of freedom is the only parameter ). Carrie Madden STAT 2000 – Unit 5 93 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Distributions 0.00 0.05 0.10 0.15 0 20 40 60 Density df df_0 df_1 df_3 Chi-Square at Various Degrees of Freedom Carrie Madden STAT 2000 – Unit 5 94 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Test Statistic If the null hypothesis is true, then we will likely have observed cell counts which are quite close to their expected cell counts and the value of the test statistic χ 2 = YYYYYYY all cells ( O E ) 2 E should be quite low . On the other hand, if the populations are not homogenous, observed cell counts will differ substantially from expected cell counts and the value of the test statistic will be high . Carrie Madden STAT 2000 – Unit 5 95 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogeneous Populations As such, we will reject the null hypothesis of homogeneity if the value of the test statistic is high , namely if it exceeds the upper α critical value from the χ 2 distribution with ( r 1 ) ( c 1 ) degrees of freedom (or equivalently, if the p -value is less than or equal to α ). Selected critical values for the χ 2 distribution are given in Table 5. Chi-square tests are always upper-tailed. Carrie Madden STAT 2000 – Unit 5 96 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example For example, we see from the table that P ( χ 2 ( 4 ) 6 . 74 ) = 0 . 15 Carrie Madden STAT 2000 – Unit 5 97 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example p <- pchisq( 6.74 , 4 , lower.tail = FALSE) p ## [1] 0.1502827 Carrie Madden STAT 2000 – Unit 5 98 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogeneous Populations Like the z procedures for comparing two proportions, the chi-square test for homogeneity is an approximate method. The approximation becomes more accurate as the observed counts in the cells become larger. In practice, we can safely use the chi-square distribution in our tests for homogeneity if: no more than 20% of expected cell counts are less than five, and there are no expected cell counts less than one. Carrie Madden STAT 2000 – Unit 5 99 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Inference for Homogeneous Populations In our example, none of the expected cell counts are less than five, and so the chi-square approximation is justified. We will now conduct the formal hypothesis test from the beginning. Carrie Madden STAT 2000 – Unit 5 100 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department 1 Let α = 0 . 05. 2 We are testing the hypotheses H 0 : Opinions of students are homogeneous with respect to the quality of teaching they received. H a : Opinions of students are not homogeneous with respect to the quality of teaching they received. 3 We will reject H 0 if the p -value α = 0 . 05. Carrie Madden STAT 2000 – Unit 5 101 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department In order to compute the test statistic, we must first calculate each of the cell chi-square values separately. For instance, we previously calculated that the expected count for the first row and second column (positive ratings from A02 students) is 19.61. The observed cell count is 16, and so the cell chi-square value is ( O E ) 2 E = ( 16 19 . 61 ) 2 19 . 61 = 0 . 66 Other cell chi-square values are calculated similarly and are shown in the following table below both the observed and expected cell counts (and are in brackets). Carrie Madden STAT 2000 – Unit 5 102 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department Section Rating A01 A02 A03 A04 Row Total Positive 22 16 25 10 73 (16.69) (19.61) (18.77) (17.94) [1.09] [0.66] [2.07] [3.51] Neutral 14 21 13 14 62 (14.17) (16.65) (15.94) (15.23) [0.00] [1.14] [0.54] [0.10] Negative 4 10 7 19 40 (9.14) (10.74) (10.29) (9.83) [2.89] [0.05] [1.05] [8.55] Column Total 40 47 45 43 175 Carrie Madden STAT 2000 – Unit 5 103 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department Solution 4 We can now add all the cell chi square values to obtain the value of the test statistic: χ 2 = YYYYYYY all cells ( O E ) 2 E = 1 . 69 + 0 . 66 + . . . + 8 . 55 = 22 . 25 which, under the null hypothesis, this test statistic follows a chi-square distribution with ( 2 )( 3 ) = 6 degrees of freedom. Carrie Madden STAT 2000 – Unit 5 104 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department Solution 5 The p-value is P ( χ 2 ( 6 ) 22 . 25 ) We see from Table 5 that Carrie Madden STAT 2000 – Unit 5 105 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department Interpretation of the P-value : If opinions of students in the four sections were homogeneous with respect to the quality of teaching they received, the probability of observing a value of the test statistic at least as high as 22.25 would be between 0.001 and 0.0025. Suppose we had instead conducted the test using the critical value approach. The decision rule would be to reject H 0 if χ 2 χ 2 = 12 . 59 where χ 2 = 12 . 59 is the upper 0.05 critical value from the chi-square distribution with 6 degrees of freedom. We would still reject the null hypothesis, since χ 2 = 22 . 25 > χ 2 = 12 . 59. Carrie Madden STAT 2000 – Unit 5 106 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department We can examine which cells contribute the most to the value of the test statistic in an effort to understand why opinions were found not to be homogeneous. The highest cell chi-square values are the negative rating for A04, which is higher than expected, the positive rating for A04, which is lower than expected, and the negative rating for A01, which is lower than expected. This tells us that students in A01 liked their instructor more than average and students in A04 like their instructor less than average. Carrie Madden STAT 2000 – Unit 5 107 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department P ( χ 2 ( 6 ) 20 . 25 ) = 0 . 0025 and P ( χ 2 ( 6 ) 22 . 46 ) = 0 . 001 Since 20 . 25 < χ 2 = 22 . 25 < 22 . 46, our p -value is between 0.001 and 0.0025. Since the p -value < α = 0 . 05, we reject the null hypothesis. We have sufficient evidence to conclude that opinions of students in the four sections are not homogenous with respect to the quality of teaching they received. Carrie Madden STAT 2000 – Unit 5 108 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department R Code opinion <- matrix(c( 22 , 16 , 25 , 10 , 14 , 21 , 13 , 14 , 4 , 10 , 7 , 19 ), nrow = 3 dimnames(opinion) = list(c( "Positive" , "Neutral" , "Negative" ), c opinion ## A01 A02 A03 A04 ## Positive 22 16 25 10 ## Neutral 14 21 13 14 ## Negative 4 10 7 19 Carrie Madden STAT 2000 – Unit 5 109 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department R Code mosaicplot(opinion) opinion Positive Neutral Negative A01 A02 A03 Carrie Madden STAT 2000 – Unit 5 110 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department R Code chisq.test(opinion) ## ## Pearson’s Chi-squared test ## ## data: opinion ## X-squared = 22.268, df = 6, p-value = 0.001083 Carrie Madden STAT 2000 – Unit 5 111 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Ecology Department R Code p -value p2 <- pchisq( 22.25 , 6 , lower.tail = FALSE) p2 ## [1] 0.001090816 Critical Value q1 <- qchisq( 0.05 , 6 , lower.tail = FALSE) q1 ## [1] 12.59159 Carrie Madden STAT 2000 – Unit 5 112 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer Five archers shoot several arrows at a target. The table below displays the number of times each archer hit and missed the bull’s-eye on the target: Archer Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total Hit 25 30 30 50 25 160 Missed 10 25 10 20 25 90 Column Total 35 55 40 70 50 250 Are the archers homogeneous with respect to their accuracy? \end{example} Carrie Madden STAT 2000 – Unit 5 113 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer Solution 1 Let α = 0 . 05 . 2 We are testing the hypotheses H 0 : The five archers are homogeneous with respect to their accuracy. H a : The five archers are not homogeneous with respect to their accuracy. Note that, since there are only two values of the explanatory variable (hit or miss), we are actually testing the equality of five population proportions. H 0 : p 1 = p 2 = p 3 = p 4 = p 5 H a : At least one of the population proportions differs from the others. Carrie Madden STAT 2000 – Unit 5 114 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer The chi-square test for homogeneity is in fact an extension of the z test for comparing two proportions – only the chi-square test can compare several population proportions. Solution 3 We will reject H 0 if the p-value α = 0 . 05 . We first calculate the expected cell counts. For example, the expected number of hits for Archer 5 is E = ( row 1 total ) ( column 5 total ) table total = ( 160 )( 50 ) 250 = 32 . 0 Carrie Madden STAT 2000 – Unit 5 115 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer The rest of the expected cell counts are calculated similarly and are shown in the table: Archer Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total Hit 25 30 30 50 25 160 (22.4) (35.2) (25.6) (44.8) (32.0) Missed 10 25 10 20 25 90 (12.6) (19.8) (14.4) (25.2) (18.0) Column Total 35 55 40 70 50 250 Note: None of the expected cell counts are less than five, and so the chi-square approximation is justified. Carrie Madden STAT 2000 – Unit 5 116 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer We now calculate the cell chi-square values. For example, the cell chi-square value for the number of misses for Archer 2 is ( O E ) 2 E = ( 25 19 . 8 ) 2 19 . 8 = 1 . 37 Carrie Madden STAT 2000 – Unit 5 117 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer Other cell chi-square values are calculated similarly and are shown below with the expected cell counts: Archer Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total Hit 25 30 30 50 25 160 (22.4) (35.2) (25.6) (44.8) (32.0) [0.30] [0.77] [0.76] [0.60] [1.53] Missed 10 25 10 20 25 90 (12.6) (19.8) (14.4) (25.2) (18.0) [0.56] [1.37] [1.34] [1.07] [2.72] Column Total 35 55 40 70 50 250 None of the expected cell counts are less than five, and so the chi-square approximation is justified. Carrie Madden STAT 2000 – Unit 5 118 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer Solution 4 The test statistic is: χ 2 = YYYYYYY all cells ( O E ) 2 E = 0 . 30 + 0 . 77 + . . . + 2 . 72 = 11 . 02 5 Under the null hypothesis, this test statistic follows a chi-square distribution with ( r 1 ) ( c 1 ) = ( 1 )( 4 ) = 4 degrees of freedom. The p-value is P ( χ 2 ( 4 ) 11 . 02 ) . Carrie Madden STAT 2000 – Unit 5 119 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer Solution 5 We see from Table 5 that P ( χ 2 ( 4 ) 9 . 49 ) = 0 . 05 and P ( χ 2 ( 4 ) 11 . 14 ) = 0 . 025 . Since 9 . 49 < χ 2 = 11 . 02 < 11 . 14 , our p-value is between 0.025 and 0.05. Carrie Madden STAT 2000 – Unit 5 120 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Archer Solution Since the p-value < α = 0 . 05 , we reject the null hypothesis. 6 We have sufficient evidence to conclude that the five archers are not homogeneous with respect to their accuracy. Suppose we had instead conducted the test using the critical value approach. The decision rule would be to reject H 0 if χ 2 χ 2 = 9 . 49 where χ 2 = 9 . 49 is the upper 0.05 critical value from the chi-square distribution with 4 degrees of freedom. We would still reject the null hypothesis, since χ 2 = 11 . 02 > χ 2 = 9 . 49. Carrie Madden STAT 2000 – Unit 5 121 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question A survey is conducted in each of four regions of Canada. Respondents are asked whether they approve of the job being done by the prime minister. Results are shown in the table below: Region Rating West Prairies Central Atlantic Row Total Approve 94 65 61 38 258 Disapprove 28 30 89 32 179 Neutral 18 15 20 10 63 Column Total 140 110 170 80 500 What are the degrees of freedom for the appropriate test statistic? A 5 B 6 C 8 D 9 E 11 Carrie Madden STAT 2000 – Unit 5 122 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question A survey is conducted in each of four regions of Canada. Respondents are asked whether they approve of the job being done by the prime minister. Results are shown in the table below: Rating West Prairies Central Atlantic Row Total Approve 94 65 61 38 258 Disapprove 28 30 89 32 179 Neutral 18 15 20 10 63 Column Total 140 110 170 80 500 What is the expected number of Central Canadians who disapprove of the job being done by the prime minister? A 58.28 B 60.86 C 66.34 D 69.57 E 72.49 Carrie Madden STAT 2000 – Unit 5 123 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Z Test vs. χ 2 Test We say that a chi-square test can be used to test the equality of several population proportions. In the case where we are conducting a two-sided test comparing just two proportions, the chi-square test is in fact equivalent to the two-sample z test. It can be shown that z 2 = χ 2 and the p -value’s of the two tests are identical. Carrie Madden STAT 2000 – Unit 5 124 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking Recall the smoking experiment: Example We would like to compare the effectiveness of two popular treatments that are designed to help smokers quit smoking. A sample of 125 smokers who have expressed a desire to quit smoking volunteer to participate in an experiment. The 63 subjects in Group 1 are assigned to chew nicotine gum and the 62 subjects in Group 2 are assigned to wear a nicotine patch. At the end of six months, 22 of the subjects in Group 1 and 17 of the subjects in Group 2 have quit smoking. Carrie Madden STAT 2000 – Unit 5 125 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking We conducted a test of H 0 : p 1 = p 2 vs. H a : p 1 ̸ = p 2 and we obtained a test statistic of z = 0 . 90 and a p -value of 0.3682. s <- 2 *( 1 -pnorm( 0.9 )) s ## [1] 0.3681203 Carrie Madden STAT 2000 – Unit 5 126 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking Suppose instead we constructed a 2 × 2 Table 5or the data and contracted a chi-square test for homogeneity. The resulting table is shown below: Treatment result Gum Patch Row Total Quit 22 17 39 (19.656) (19.344) [0.2795] [0.2840] Didn’t Quit 41 45 86 (43.344) (42.656) [0.1268] [0.1288] Column Total 63 62 125 Carrie Madden STAT 2000 – Unit 5 127 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking The test statistic is found to be χ 2 = 0 . 2795 + . . . + 0 . 1288 = 0 . 8191 which is the same (after a rounding error) as we found using the z test ( z 2 = ( 0 . 9 ) 2 = 0 . 81). Under the null hypothesis, the test statistics follows a chi-square distribution with ( r 1 )( c 1 ) = ( 1 )( 1 ) = 1 degree of freedom. Carrie Madden STAT 2000 – Unit 5 128 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking The exact p -value is pval <- pchisq( 0.81 , 1 , lower.tail = FALSE) pval ## [1] 0.3681203 Carrie Madden STAT 2000 – Unit 5 129 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Smoking Because of the large P-value, we do not reject H 0 , but remember, we never accept H 0 . We cannot conclude that the two treatments are homogenous; we can only say we have insufficient evidence that they are not homogeneous. We could conclude that homogeneity appears to be a reasonable assumption. Carrie Madden STAT 2000 – Unit 5 130 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question We conduct a hypothesis test of H 0 : p 1 = p 2 vs. H a : p 1 ̸ = p 2 to compare the proportion of patients whose condition improves with an experimental drug vs. a placebo. We conduct an experiment and determine the value of the test statistic to be z = 1 . 75, and the p -value is 0.08. Suppose we had instead conducted a chi-square test for homogeneity. The values of the test statistic and the p -value would be: A 3.06 and 0.0064 B 1.75 and 0.016 C 1.32 and 0.08 D 3.06 and 0.08 E 1.75 and 0.28 Carrie Madden STAT 2000 – Unit 5 131 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Test for Independence We will now examine another situation for which a chi-square test of significance is appropriate. In testing homogeneity among several populations with respect to some variable, we took a separate sample from each of the populations and compared them with respect to a single variable . We could choose the sample size we took from each of the populations (i.e., the column totals). Now consider the case where we wish to study the relationship between two categorical variables . We will take one simple random sample from a single population of individuals and measure and compare the values for the two variables. Carrie Madden STAT 2000 – Unit 5 132 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Test for Independence We will then conduct a test of significance to examine whether or not the two variables of interest are independent . That is, we will test the hypotheses H 0 : The two categorical variables of interest are independent. H a : The two categorical variables of interest are not independent (i.e., they are dependent). Carrie Madden STAT 2000 – Unit 5 133 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness According to a famous saying “money can’t buy you happiness”, but are wealth and happiness really independent? A psychological study was conducted, in which subjects were analyzed and categorized as either very happy, somewhat happy or unhappy. The income levels of subjects were also examined, and each subject was categorized as either low income, middle class, or wealthy. Let α = 0 . 10. Carrie Madden STAT 2000 – Unit 5 134 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness The data are displayed in the table below: Income Levels Happiness Low Income Middle Class Wealthy Row Total Very Happy 7 16 13 36 Somewhat Happy 10 20 11 41 Unhappy 5 7 5 17 Column Total 22 43 29 94 Carrie Madden STAT 2000 – Unit 5 135 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness Let α = 0 . 10.\ We are testing the hypotheses H 0 : Wealth and happiness are independent. H a : Wealth and happiness are dependent. We will reject H 0 if p -value α = 0 . 10. Recall again that all hypothesis tests are conducted under the assumption that the null hypothesis is true. So we must again ask, if the two variables really are independent, then what would we expect to see? Carrie Madden STAT 2000 – Unit 5 136 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Test for Independence For example, if wealth and happiness were independent, what would be the expected number of individuals in the sample who are unhappy and wealthy? Recall: If two events are independent then: P ( A and B ) = P ( A ) P ( B ) For example, if wealth and happiness are independent, the probability of a person being unhappy and wealthy is P ( U and W ) = P ( U ) P ( W ) Carrie Madden STAT 2000 – Unit 5 137 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Test for Independence As such, the expected number of individuals in our sample who are unhappy and wealthy is E = ( total sample size ) P ( unhappy and wealthy ) = ( table total ) P ( unhappy ) P ( wealthy ) Of course, we don’t know the true probabilities, so we must estimate them by our sample proportions. Carrie Madden STAT 2000 – Unit 5 138 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness The estimated probability of an individual being unhappy is ˆ p U = row 3 total table total = 17 94 = 0 . 1809 The estimated probability of an individual being wealthy is ˆ p W = column 3 total table total = 29 94 = 0 . 3085 Carrie Madden STAT 2000 – Unit 5 139 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Expected Cell Counts Therefore, the expected number of individuals in the sample who are unhappy and wealthy is E = ( total sample size ) ˆ p U ˆ p W = ( table total ) row 3 total table total column 3 total table total = ( row 3 total ) ( column 3 total ) table total = ( 17 )( 29 ) 94 = 5 . 24 Carrie Madden STAT 2000 – Unit 5 140 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Expected Cell Counts By a similar line of reasoning, the expected frequency for the cell at the intersection of the r th row and the c th column is E = ( row r total ) ( column c total ) table total T he formula for the expected cell count is the same as it was for the test of homogeneity , but for different reasons! The two test are in fact identical. Carrie Madden STAT 2000 – Unit 5 141 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness Other expected cell counts are calculated similarly and are shown in the table below: Income Levels Happiness Low Income Middle Class Wealthy Row Total Very Happy 7 16 13 36 (8.43) (16.47) (11.11) Somewhat Happy 10 20 11 41 (9.60) (18.76) (12.65) Unhappy 5 7 5 17 (3.98) (7.78) (5.24) Column Total 22 43 29 94 Carrie Madden STAT 2000 – Unit 5 142 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness One of the nine cells (11 % < 20 % ) has an expected count less than five, and all expected cell counts are greater than one, so the use of the chi-square approximation is justified. We will now calculate the cell chi-square values. For example, the cell chi-square value for somewhat happy and wealthy individuals is ( O E ) 2 E = ( 11 12 . 65 ) 2 12 . 65 = 0 . 22 Carrie Madden STAT 2000 – Unit 5 143 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Money and Happiness Other cell chi-square values are calculated similarly and are shown in the table below: Income Levels Happiness Low Income Middle Class Wealthy Row Total Very Happy 7 16 13 36 (8.43) (16.47) (11.11) [0.24] [0.01] [0.32] Somewhat Happy 10 20 11 41 (9.60) (18.76) (12.65) [0.02] [0.08] [0.22] Unhappy 5 7 5 17 (3.98) (7.78) (5.24) [0.26] [0.08] [0.01] Column Total 22 43 29 94 Carrie Madden STAT 2000 – Unit 5 144 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Test Statistic The test statistic is χ 2 = YYYYYYY all cells ( O E ) 2 E = 0 . 24 + 0 . 01 + . . . + 0 . 01 = 1 . 24 . Under the null hypothesis, this test statistic follows a chi-square distribution with ( r 1 ) ( c 1 ) = ( 2 )( 2 ) = 4 degrees of freedom. The p -value is P ( χ 2 ( 4 ) 1 . 24 ) . Carrie Madden STAT 2000 – Unit 5 145 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests P -Value We see from Table 5 that P ( χ 2 ( 4 ) 5 . 38 ) = 0 . 25 Since χ 2 = 1 . 24 < 5 . 38, our p -value is greater than 0.25. Carrie Madden STAT 2000 – Unit 5 146 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example Since the p -value> α = 0 . 10, we fail to reject the null hypothesis at the 10 % level of significance. We have insufficient evidence to conclude that wealth and happiness are dependent (i.e., independence appears to be a reasonable assumption). Suppose we had instead conducted the test using the critical value approach. The decision rule would be to reject H 0 if χ 2 χ 2 = 7 . 78 where χ 2 = 7 . 78 is the upper 0.10 critical value from the chi-square distribution with 4 degrees of freedom. We would still fail to reject the null hypothesis, since χ 2 = 1 . 24 < χ 2 = 7 . 78. Carrie Madden STAT 2000 – Unit 5 147 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Example The individuals in the previous example were also asked which political party they support. We would like to conduct a hypothesis test, at the 1% level of significance, to determine whether wealth and political preference are independent. The data are displayed in the table below: Income Levels Political Party Low Income Middle Class Wealthy Row Total Conservative 2 14 20 36 Liberal 6 10 6 22 NDP 11 12 2 25 Green 3 7 1 11 Column Total 22 43 29 94 \end{frame} Carrie Madden STAT 2000 – Unit 5 148 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Solution 1 Let α = 0 . 01 . 2 We are testing the hypotheses H 0 : Wealth and political preference are independent. H a : Wealth and political preference are dependent. 3 We will reject H 0 if the p-value α = 0 . 01 . Carrie Madden STAT 2000 – Unit 5 149 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Solution We will first calculate the expected cell counts. For example, the expected number of middle class individuals in the sample who support the NDP is E = ( row 3 total ) ( column 2 total ) table total = ( 25 )( 43 ) ( 94 ) = 11 . 44 Other expected cell counts are calculated similarly and are shown in the table on the following page. Carrie Madden STAT 2000 – Unit 5 150 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Solution Income Levels Political Party Low Income Middle Class Wealthy Row Total Conservative 2 14 20 36 (8.43) (16.47) (11.11) Liberal 6 10 6 22 (5.15) (10.06) (6.79) NDP 11 12 2 25 (5.85) (11.44) (7.71) Green 3 7 1 11 (2.57) (5.03) (3.39) Column Total 22 43 29 94 Carrie Madden STAT 2000 – Unit 5 151 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Note: Two of the 12 cells (17 % < 20 % ) have expected counts less than five, and all expected cell counts are greater than one, so the use of the chi-square approximation is justified. Solution We will now calculate the cell chi-square values. For example, the cell chi-square value for wealthy individuals who support the Liberal Party is χ 2 = ( O E ) 2 E = ( 6 6 . 79 ) 2 6 . 79 = 0 . 09 Other cell chi-square values are calculated similarly and are shown in the table on the following page. Carrie Madden STAT 2000 – Unit 5 152 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Solution Income Levels Political Party Low Income Middle Class Wealthy Row Total Conservative 2 14 20 36 (8.43) (16.47) (11.11) [4.90] [0.37] [7.11] Liberal 6 10 6 22 (5.15) (10.06) (6.79) [0.14] [0.00] [0.09] NDP 11 12 2 25 (5.85) (11.44) (7.71) [4.53] [0.03] [4.23] Green 3 7 1 11 (2.57) (5.03) (3.39) [0.07] [0.77] [1.68] Column Total 22 43 29 94 Carrie Madden STAT 2000 – Unit 5 153 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Solution 4 The test statistic is: χ 2 = YYYYYYY all cells ( O E ) 2 E = 4 . 90 + 0 . 37 + . . . + 1 . 68 = 23 . 92 Under the null hypothesis, this test statistic follows a chi-square distribution with ( 3 )( 2 ) = 6 degrees of freedom. 5 The p-value is P ( χ 2 ( 6 ) 23 . 92 ) P ( χ 2 ( 6 ) 22 . 46 ) and P ( χ 2 ( 6 ) 24 . 10 ) = 0 . 0005 Since 22 . 46 < χ 2 = 23 . 92 < 24 . 10 , our p-value is between 0.0005 and 0.001. Carrie Madden STAT 2000 – Unit 5 154 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party Solution Since the p-value < α = 0 . 01 , we reject the null hypothesis. 6 We have sufficient evidence to conclude that wealth and political preference are dependent. Suppose we had instead conducted the test using the critical value approach. The decision rule would be to reject H 0 if χ 2 χ 2 = 16 . 81 where χ 2 = 16 . 81 is the upper 0.01 critical value from the chi-square distribution with 6 degrees of freedom. We would still reject the null hypothesis, since χ 2 = 23 . 92 > χ 2 = 16 . 81 Carrie Madden STAT 2000 – Unit 5 155 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Political Party We see that the four cells that contribute the most to the test statistics are for low income and wealthy individuals who support the Conservative Party and the NDP. It appears that poorer individuals tend to support the NDP while wealthier voters are more likely to support the Conservative Party. Carrie Madden STAT 2000 – Unit 5 156 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question In a survey, 400 people were classified with respect to stress level and the occurrence of migraine headaches. The data are shown in the table below: Stress Level Migraine Low High Never 110 43 153 Occasional 114 80 194 Often 26 27 53 250 150 400 We would like to test whether the two variables are independent at the 5% level of significance. What is the critical value of the test? A 4.61 B 5.99 C 7.81 D 9.49 E 11.07 Carrie Madden STAT 2000 – Unit 5 157 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question In a survey, 400 people were classified with respect to stress level and the occurrence of migraine headaches. The data are shown in the table below: Stress Level Migraine Low High Never 110 43 153 Occasional 114 80 194 Often 26 27 53 250 150 400 What is the expected count of individuals with high stress who occasionally suffer from migraine headaches? A 74.25 B 70.75 C 72.75 D 78.50 E 76.25 Carrie Madden STAT 2000 – Unit 5 158 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question In a survey, 400 people were classified with respect to stress level and the occurrence of migraine headaches. The data are shown in the table below: Stress Level Migraine Low High Never 110 43 153 95.625 57.375 Occasional 114 80 194 121.25 72.75 Often 26 27 53 33.125 19.875 250 150 400 What is the cell chi-square value for low stress individuals who never experience migraines? A 1.97 (B) 2.04 (C) 2.16 (D) 1.78 (E) 1.89 Carrie Madden STAT 2000 – Unit 5 159 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Questions Stress Level Migraine Low High Never 110 43 153 95.625 57.375 2.1609 3.6016 Occasional 114 80 194 121.25 72.75 0.4335 0.7225 Often 26 27 53 33.125 19.875 1.5325 2.5542 250 150 400 What is the p -value for the chi-square test for independence? A between 0.001 and 0.0025 B between 0.0025 and 0.005 C between 0.005 and 0.01 D between 0.01 and 0.02 E between 0.02 and 0.025 Carrie Madden STAT 2000 – Unit 5 160 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Chi-Square Goodness-of-fit Tests Throughout most of this course, we have conducted tests of significance concerning some population parameter while assuming the form of the population distribution is known. For example, many of our methods have required the assumption that the variable of interest follows a normal distribution. The most we have been able to do is to look at a histogram of the sample data to assess whether normality was a reasonable assumption. Carrie Madden STAT 2000 – Unit 5 161 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Goodness-of-fit What if we don’t know the distribution and we would like to determine if it has some specific form? Fortunately, we have the option of conducting a formal test of significance to determine whether a random variable X follows some specific distribution. These tests are known as the chi-square goodness of fit tests . Carrie Madden STAT 2000 – Unit 5 162 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 The Western Canada Pick 3 is a lottery which players choose a three-digit number (between 000 and 999). Draws are held once per day, when a lucky three-digit number is selected. One player has suspicions about whether the draws are truly random. She has noticed that some digits seem to come more frequently than others. Carrie Madden STAT 2000 – Unit 5 163 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 We will conduct a hypothesis test to investigate the player’s suspicion. Let X be a digit selected in a given Pick 3 draw. If the draw was random, we would expect all digits 0 through 9 to be drawn with equal probability, and so over a long period of time, we would expect an equal number of each digit to have been drawn. Carrie Madden STAT 2000 – Unit 5 164 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 A random variable X is said to follow a discrete uniform distribution with parameter a and b ( a < b ) if all integer values from a to b have equal probability. We are testing whether X has a discrete uniform distribution on the interval from a = 0 to b = 9, which would mean the probability distribution of X is P ( X = x ) = 1 10 , for x = 0 , 1 , . . . , 9 and we write X DU ( 0 , 9 ) . Carrie Madden STAT 2000 – Unit 5 165 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 The parameters of any discrete uniform distribution are the minimum and maximum values of X (in our case, 0 and 9, respectively), which enable us to calculate probabilities of occurrence for any value of X . We will now conduct the chi-square goodness of fit test to investigate whether the player’s claim has any merit. Carrie Madden STAT 2000 – Unit 5 166 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 Let α = 0 . 05. We are testing the hypotheses H 0 : All ten digits are equally likely to be drawn, i.e., X follows a uniform distribution. H a : The ten digits are not equally likely to be drawn, i.e., X does not follow a uniform distribution. We will reject H 0 if the p -value α = 0 . 05. Carrie Madden STAT 2000 – Unit 5 167 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 Data were tabulated for all 1000 Pick 3 draws from November 1, 2008 to July 30, 2011. The number of times each digit was drawn is shown in the contingency table below. Digit 0 1 2 3 4 5 6 7 8 9 Freq. 314 285 302 307 294 277 284 321 318 298 Note that in 1000 draws, a total of 3 ( 1000 ) = 3000 digits were drawn. Carrie Madden STAT 2000 – Unit 5 168 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 The first thing to do is to look at a histogram of the data to get an idea of what we can expect from the test. The histogram is shown below: 0 2 4 6 8 10 0 50 100 150 200 250 300 It certainly appears reasonable to assume that these data come from a population that is uniformly distributed. We still need the test, however, to verify this. Carrie Madden STAT 2000 – Unit 5 169 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 The expected frequency of each of these digits under the null hypothesis is E = ( total number of observations ) ( digit ) = 3000 ( 0 . 1 ) = 300 Note: The use of the chi-square distribution for goodness of fit tests is approximate, and can safely be used when all cell counts are at least five . Clearly this condition is satisfied in this case. Carrie Madden STAT 2000 – Unit 5 170 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 We now calculate the cell chi-square values for each of the digits. For example, the cell chi-square value for the digit 8 is ( O E ) 2 E = ( 318 300 ) 2 300 = 1 . 08 Other cell chi-square values are calculated similarly and are displayed under the observed and expected counts in the table on the next page. Carrie Madden STAT 2000 – Unit 5 171 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 Digit 0 1 2 3 4 5 6 7 8 9 Freq. 314 285 302 307 294 277 284 321 318 298 300 300 300 300 300 300 300 300 300 300 0.65 0.75 0.02 0.16 0.12 1.76 0.85 1.47 1.08 0.01 We calculate the chi-square test statistic by adding all of the cell chi-square values: χ 2 = 0 . 65 + 0 . 75 + . . . + 0 . 01 = 6 . 86 Carrie Madden STAT 2000 – Unit 5 172 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 Under the null hypothesis, this test statistic follows a chi-square distribution with degrees of freedom equal to ( # of cells 1 ) = 10 1 = 9 The degrees of freedom for goodness of fit tests are only (# of cells 1) if we know the values of all necessary parameters. In the above example, we knew the maximum and minimum values, so all parameter values are known. When we don’t know the value of a parameter, we must estimate it and deduct one additional degree of freedom . Carrie Madden STAT 2000 – Unit 5 173 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 In general, the degrees of freedom in a chi-square goodness of fit test are ( # of cells 1 ) ( # of estimated parameters ) Carrie Madden STAT 2000 – Unit 5 174 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 We see from Table 5 that P ( χ 2 ( 9 ) 11 . 29 ) = 0 . 25. Carrie Madden STAT 2000 – Unit 5 175 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Western Canada Pick 3 Since the p > α = 0 . 05, we fail to reject the null hypothesis. We have insufficient evidence to conclude that the selected numbers do not follow a uniform distribution (i.e., it is reasonable to assume that the distribution is uniform). Suppose we had instead conducted the test using the critical value approach. The decision rule would be to reject H 0 if χ 2 χ 2 = 16 . 92 where χ 2 = 16 . 92 is the upper 0.01 critical value from the chi-square distribution with 9 degrees of freedom. We would still fail to reject the null hypothesis since χ 2 = 6 . 86 < χ 2 = 16 . 92 Carrie Madden STAT 2000 – Unit 5 176 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Optometrist An optometrist believes that the distribution of eye colours in some population is as follows: Eye Colour Brown Blue Green Other Probability 0.5 0.1 0.2 0.2 In a sample of 150 people from the population, 66 have brown eyes, 23 have blue eyes, 35 have green eyes and 26 have another eye colour. We would like to conduct a chi-square goodness-of-fit test at the 5 % level of significance to determine whether the optometrist’s proposed distribution is correct. Carrie Madden STAT 2000 – Unit 5 177 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Optometrist 1 Let α = 0 . 05 2 H 0 The Optometrists proposed distribution is correct vs. H a : The distribution is something other than the proposition. 3 Decision Rule: Reject H 0 if p -value is 0 . 05 Carrie Madden STAT 2000 – Unit 5 178 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Optometrist 4 Test Statistic: Eye Colour Brown Blue Green Other Observed 66 23 35 26 Expected (150*0.5=75) 15 30 30 Cell Chi-Square 1.08 4.27 0.833 0.533 Therefore χ 2 = 6.717 Carrie Madden STAT 2000 – Unit 5 179 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Optometrist 5 p -value: p -value: Follow along # cells 1 = 3df. We see our test statistic falls between 6.25 and 7.82 which corresponds to a p -value between 0.05 and 0.10. pvalue <- pchisq( 6.716 , 3 , lower.tail = FALSE) pvalue ## [1] 0.08152236 Carrie Madden STAT 2000 – Unit 5 180 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Optometrist 6 Conclusion : Since our p -value is > 0 . 05, we fail to reject H 0 , there is insufficent evidence to support the alternative, hence the Optometrist’s proposition is plausible. Carrie Madden STAT 2000 – Unit 5 181 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Example – Optometrist Supposed we used the critical value method, the decision rule would be: Reject H 0 if χ 2 χ 2 3 , 0 . 05 = 7 . 82. Since our test statistic is less than 7.82 we have the same conclusion. qchisq( 0.05 , 3 , lower.tail = FALSE) ## [1] 7.814728 Carrie Madden STAT 2000 – Unit 5 182 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question A website claims that the distribution of blood types for a certain group of people is as follows: Blood Type A B AB O Probability 0.4 0.2 0.1 0.3 In a random sample of 200 people from this group, 70 had blood type A, 20 had blood type B, 30 had blood type AB and 80 had blood type O. We would like to conduct a chi-square goodness of fit test at the 1% level of significance to verify the website’s claim. What is the critical value of the test? A 7.82 B 9.49 C 13.28 D 11.35 E 6.63 Carrie Madden STAT 2000 – Unit 5 183 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Unit 5 – Test for Proportions and Analysis of Categorical Data and Goodness-of-fit Tests Practice Question Under the null hypothesis (that the distribution of blood types is the one claimed by the website), the expected number of people in the sample with blood type A is 200 ( 0 . 4 ) = 80. Other expected counts are calculated similarly and are shown below: Blood Type A B AB O Count 70 20 30 80 Expected 80 40 20 60 What is the value of the appropriate test statistic? A 22.92 B 18.78 C 24.37 D 31.84 E 16.05 Carrie Madden STAT 2000 – Unit 5 184 / 184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help