Lab_Wk04_2023

docx

School

University of Wollongong *

*We aren’t endorsed by this school

Course

251

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

5

Uploaded by PresidentMusicCaterpillar33

Report
STAT251 Fundamentals of Biostatistics LABORATORY NOTES Week 4 Exploring Associations: Categorical Variables and Probability Aim: The focus of this lab is exploring associations involving two categorical variables. This lab also consolidates the concepts of independence and conditional probability. Go to Moodle Week 4 Section and download the data file StudentData.csv. Open Jamovi and import the file: select the triple-bar icon, and select Open . Then Browse to the location that data file is in and open the file. To save your analyses in the following sections, save the file as a .omv file at the end of your session. 1. Categorical Response and Categorical Explanatory Variable Let’s see if we can detect any evidence of a relationship between a person’s birthplace and their sex. This information is in the two categorical variables Sex and Born. We will do this by examination of evidence generated in the following three ways: 1. Comparisons of proportions in a contingency table of the variables concerned; 2. Comparison of the column heights in bar charts; 3. Mathematical determination of statistical independence. First check the variable definition and any coding for the variables concerned. It is helpful to look at the frequencies of a categorical variable first to check that the observations have all been entered as the codes or categories suggest. That is, check that the number of categories is as expected and names are consistently spelled. For example, for Sex, check observations have not been entered as both Male and male or both Female and f as that would show as 4 categories. Select Analyses → Exploration →Descriptives. Put Sex and Born into the Variables box, and tick the Frequency Tables box. Discussion : What are the categories for the Born variable? What percentage of students were born in each? Are there any missing values for either variable? 2.1 Contingency Table of Raw Data Now we can obtain the two-way table of counts (or frequencies) for the two nominal variables: Sex and Born . Select Analyses→Frequencies→Independent Samples . Put the variable Born into Rows and Sex into Columns . In the Cells menu, select the Row , Column and Total options under Percentages . STAT251 Laboratory Notes Week 4 1
Contingency Tables sex Born Female Male Total Australia n Observed 23 85 108 % within row 21.30 % 78.70 % 100.00 % % within column 63.89 % 55.56 % 57.14 % % of total 12.17 % 44.97 % 57.14 % Oversea s Observed 13 68 81 % within row 16.05 % 83.95 % 100.00 % % within column 36.11 % 44.44 % 42.86 % % of total 6.88 % 35.98 % 42.86 % Total Observed 36 153 189 % within row 19.05 % 80.95 % 100.00 % % within column 100.00 % 100.00 % 100.00 % % of total 19.05 % 80.95 % 100.00 % STAT251 Laboratory Notes Week 4 2
Note : It is important to read the cell entries of the table in the order specified by the text. For example: The sample contained 85 Australian born males, representing 78.7% (85/108) of the total number of Australian born students in the sample, 55.6% (85/153) of the total number of males in the sample and 45% (85/189 85 189 ) of the total number of students in the sample. Log book questions: Recall that ∩ = “both … and …” (intersection, overlap) = “either … or … or both” (union) | = “… given …” (conditional) 1. Use the above table to compute the following probabilities, empirically. Write each answer first as a fraction using the raw counts, then calculate the percentage. Check your answer with the appropriate percentage in the table. a. P(Australian) b. P(Female) c. P(Australian ∩ Female) d. P(Australian Female) e. P(Australian Male) f. P(Female│Australian) g. P(Australian│Female) Note : if we are assuming these data are a good sample of students from the wider student population then these will be estimates of the population probabilities. 2.2 Examining a bar chart To create a bar plot with one categorical variable split by another categorical variable, go to Analyses > Frequencies > Independent Samples . Put the variable Born into Rows and Sex into Columns . Then, under the Plots menu, select Bar plot , and under Y-Axis , select Percentages within columns . The resulting plot should appear as follows: Log book questions: STAT251 Laboratory Notes Week 4 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. Write down each probability that the bar chart has used, using notation as given in Logbook Q1 and comment on the results. 3. Given that Jamovi uses the rows as X axis, comment on which plot may be better, this one or the one using the within row percentages. 2.3 Independence of the two variables. When two events, say A and B, are independent, both of the following conditions hold: P(A and B)=P(A)×P(B) P(A ∩ B) = P(A) P(B) and, equivalently, P(A | B) = P(A) If these statements are not true, then the variables or events are dependent or associated. For a sample, we will discuss how close to equality is close enough in Week 8. For a population, the equality should hold in the true probabilities. Log book question: Set A = Australian and F = Female. 4. Calculate or obtain from the previous question the probability (approximated by the sample) values of P(A and B)=P(A)×P(B) P(A ∩ F) and of P(A) P(F), as applied to variables Born and Sex . Are they similar? 5. Similarly, compare P(F | A) and P(F). If time permits, determine the expected count for Aust ∩ Female under the model of independence. Try it by hand first for at least one or two cells. Think of what you need to do to get “expected under independence” counts. After trying by hand, now check by going to Analyses→Frequencies→Independent Samples . Put the variable Born into Rows and Sex into Columns . In the Cells menu (as in Section 2.1), under Counts , select the Observed and Expected counts. Compare the observed counts with the expected counts. How close are they? NB: The expected count (under independence) for Aust ∩ Female is P(A) P(F) n . 3. Applications of conditional probability: measures of the accuracy of a test. Remember from lectures: Sensitivity of a test : the proportion of people who correctly test positive when they actually have the disease, that is, P ( Positive | Disease ) = P ( Positive Disease ) P ( Disease ) Specificity of a test : the proportion of people who correctly test negative when the disease is not present, that is P ( Negative | No Disease ) = P ( Negative No Disease ) P ( No Disease ) To study the effectiveness of a diagnostic test for a rare disease, researchers selected 100 individuals who actually had the disease and 9,900 who did not. Both groups were tested to produce the following results: (Utts and Heckard, p. 203). These two groups have been selected from two different populations. Of the 100 individuals with the disease, 80 tested positive. Of the 9900 individuals without the disease, 8,910 tested negative. NB: As we do not have a random sample from the population of interest, we cannot infer P(D), P(D and T), or P(T) from a table. To convince yourself of this, consider a different sample that included only diseased individuals; this sample would result in a much larger estimate of P(T). STAT251 Laboratory Notes Week 4 4
Log book questions: 6. Estimate the sensitivity of the test. 7. Estimate the specificity of the test. 8. Suppose that the prevalence of the disease in the population is actually 1 in 1000. a. Draw a tree diagram of the outcomes. (See below to fill in the tree) b. Find the predictive value positive, P ( Disease Positive ) , and write a sentence interpreting the result in the context of the problem. c. Find the predictive value negative, P ( No Disease Negative ) , and write a sentence interpreting the result in the context of the problem. STAT251 Laboratory Notes Week 4 5