Section 07

.pdf

School

Pennsylvania State University *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by officialcdot127

Report
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables STAT 200: Lab Activity for Section 7.2 Testing for an association between two categorical variables - Learning objectives: Formulate correct hypotheses With the theory-based Chi-square Test of Association approach: 1. calculate and interpret the expected counts 2. understand the relationship between the chi-square contributions and the chi-square statistic’s final value. 3. use statistical software such as Minitab to perform a hypothesis test 4. recognize when it is appropriate to use the theory-based approach Obtain a p-value using the Randomization (simulation): Chi-Square Test of Association approach when conditions are not met for the theory approach Activity 0: Just notice these formulas! Expected count : [(row total)*(column total)] / sample size Residual = (observed count expected count) Chi-square contribution : (observed count expected count) 2 / expected count Chi-square statistic : sum up all chi-square contributions df = (r-1)*(c-1), where r = the number of rows and c = the number of columns Activity 1: Theory-Based chi-square test, expected counts and cell contributions Data from four different Pew Research Surveys that looked at Facebook usage are summarized in Table 1. Source: https://www.pewresearch.org/ The survey question of interest, for each of the four years, is: How often do you visit Facebook? A. Less often than once a day B. About Once a day C . Several times a day Table 1: Summarized “ Observed Data from the Four Pew Research Surveys Year Less Once Several Total 2013 221 230 384 835 2018 521 465 1012 1998 2019 170 230 510 910 2021 436 330 736 1502 Total 1348 1255 2642 5245 Exploratory Data Anaylsis 1. Which type of data is summarized in the 4×3 contingency table? A. categorical B. quantitative 2. Identify the variables: Explanatory: ______________ Response: ______________ Analysis We are interested in testing the hypotheses: H 0 : There is no association between year and frequency of visiting Facebook H a : There is an association between year and frequency of visiting Facebook year How often people visit facebook
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables The survey data is displayed on the graph found below. 3. Descriptively, does the graph suggest that there is an association between year and frequency of visiting Facebook? Provide reasoning for your answer. The expected counts are displayed below in Table 2. Table 2: Expected Counts for the four Research Pew Research Surveys Year Less Once Several 2013 214.6 199.8 2018 513.5 478.1 1006.4 2019 233.9 217.7 458.4 2021 386.0 359.4 756.6 4. Calculate the expected count for the missing cell in this table. Report to one decimal place. Refer to Table 1. 5. Interpret the expected count for the cell: Visiting Facebook several times a day for the Year 2013. We would ___________ around _________ participants from the 2013 survey to say they visited Facebook at least several times a day, if there is _________ association between the two variables. Blank 1: (observe, expect) Blank 2 : (fill in calculated count) Blank 3: ( an, no) Let’s now use Minitab to complete some the calculations by first typing in the “summarized” observed data from Table 1 into a Minitab worksheet. 6. Use Minitab to obtain both the observed and expected counts. Check to see if the provided expected counts match the expected counts found in Table 2. This includes checking to see if the expected count that you calculated by hand also matches. association yes graph gres up O 0
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables 7. Have we met the conditions for the theory-based Chi-Square Test approach? Include reasoning. If met, use Minitab to obtain the chi-square statistic. 8. What is the chi-square statistic? 9. What are the degrees of freedom for the relevant distribution? How were they calculated? 10. What is the p-value (Pearson)? Sketch a picture of the p-value with labelling. Look at lecture notes or verify with Statkey when using theoretical distributions 11. Write out an interpretation of the p-value. 12. What is the appropriate conclusion for our hypothesis test? Activity 2: Perform a theory-based chi-square test Heart failure is a common event caused by cardiovascular disease. This raw data, which includes 11 clinical features that can be used to predict heart disease, is in a file called HeartDisease. The complete codebook can be found at the website provided. Source: https://www.kaggle.com/fedesoriano/heart-failure-prediction Consider two variables from this data set. Variable 1: Resting Electrocardiogram (ECG) result: Normal ST- T-wave abnormality LVH - left ventricular hypertrophy (LVH ) Variable 2 : Patient diagnosed with Heart Disease: Yes (Heart Disease) No (Normal) Exploratory Data Analysis 1. How many cases are found in this dataset? Note: if you look at the data there are some missing observations with certain variables, but not with the two under consideration. Real data sets often have missing data. Analysis 2. We want to perform a hypothesis test procedure to see if the two variables are associated or not. Write the null and alternative hypothesis of the test. Null Hypothesis: Alternative Hypothesis: Use Minitab to get necessary information to answer the questions below. 3. Have we met the conditions for the theory-based Chi-Square Test Approach? Include reasoning. * yes >5 E 0 . 000 yes evidence 918 yes - 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help