Practice Problems Module 13 (Solutions)

.pdf

School

Columbia University *

*We aren’t endorsed by this school

Course

V3020

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

9

Uploaded by SuperBatPerson921

P6104 Practice Problems (Solutions) Module 13: Effect Modification in Multiple Linear Regression and Introduction to Logistic Regression 1. An ecological study examined the effect of water fluoridation on tooth decay in 5 year old children using data collected at the level of the electoral ward (in Canada). The electoral wards included were in three areas where the water supply was either unfluoridated, artificially fluoridated, or naturally fluoridated. A multiple linear regression model was fitted with mean tooth decay in the ward as the outcome and with predictors Jarman underprivileged area score for each ward and fluoridation status (unfluoridated, artificially fluoridated or naturally fluoridated). A high Jarman score indicates an area with high deprivation. The authors reported that there was a significant interaction between the effects of Jarman score and water fluoridation on tooth decay. A graph similar to this was given (Jones et al. , 1997). (a) What is meant by interaction? In this example, interaction means that the mean change in tooth decay score for a one unit increase in Ward Jarman Score is different depending on what type of fluoridation status is present. (b) How would you interpret a statistically significant interaction here? If the interaction were statistically significant in this example, we would note that the mean tooth decay score increases much more quickly as Ward Jarman Scores increase in the areas with no fluoridation whereas the mean tooth decay score increases somewhat more steadily as Ward Jarman Scores increase in areas with Artificial or Natural fluoridation.
2. The data set “lowbwt”, contains information for a sample of 100 low birth weight infants born in two teaching hospitals in Boston, Massachusetts. Systolic blood pressure measurements are saved under the variable name sbp , gestational ages under gestage , the five-minute apgar score under apgar5 , and the gender of each infant under sex (0 = female; 1 = male). sbpdat <- read.table(“lowbwt.txt”, header = TRUE) sbp <- sbpdat$sbp gestage <- sbpdat$ gestage sex <- sbpdat$sex (a) Fit a multiple linear regression model with sbp as the response and gestational age, sex and the product of gestational age and sex as predictors. Write the estimated least squares regression line. linreg <- lm(sbp ~ gestage + sex + gestage*sex) summary(linreg) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.9805 15.2419 0.983 0.3282 gestage 1.0903 0.5254 2.075 0.0406 * sex -15.1570 27.7433 -0.546 0.5861 gestage:sex 0.5714 0.9569 0.597 0.5518 --- s ˆ bp = 14.9805 + 1.0903* gestage 15.1570* sex + 0.5714* gestage * sex (b) What is the predicted sbp if the gestational age is 29 weeks and the sex is male? s ˆ bp = 14.9805 + 1.0903*29 15.1570*1 + 0.5714*29*1 = 48.0128 (c) One male baby had a gestational age of 29 weeks and an sbp of 43. What is this baby’s residual? Residual = Observed – Expected = 43 – 48.0128 = -5.0128 (d) On the same set of axes, sketch regression lines for the sbps of male and female babies. If sex = 1 (male) s ˆ bp = 14.9805 + 1.0903* gestage 15.1570*1 + 0.5714* gestage *1 = 0.1765 + 1.6617* gestage If sex = 0 (female) s ˆ bp = 14.9805 + 1.0903* gestage 15.1570*0 + 0.5714* gestage *0 = 14.9805 + 1.0903* gestage
20 25 30 35 40 35 40 45 50 55 60 65 g.age sbp.m (e) Is sex an effect modifier for gestational age when considering sbp? No, it does not appear that sex is an effect modifier for gestational age since the interaction term has a p-value of 0.5515, which is large (greater than 0.05). Note that even though the plot above might suggest that there is an interaction, the data do not suggest that this a statistically significant interaction. -0.1765 + 1.6617 *gestage 14.9805 + 1.0903*gestage
3. The data set “apache.txt” contains information on 30 day mortality in a sample of septic patients as a function of their baseline APACHE II Score (an integer score from 0 to 71, higher scores correspond to more severe disease). Patients are coded as 1 or 0 depending on whether they are dead or alive in 30 days, respectively. APACHE II score can be thought of as a continuous measure, but since it consists of integer values we can construct a reasonably informative table to better understand the data. Additionally, a plot and two fitted models are provided. death APACHEII 0 1 0 1 0 2 1 0 3 3 1 4 11 0 5 6 3 6 11 3 7 8 4 8 17 5 9 30 3 10 15 5 11 26 5 12 12 5 13 19 13 14 18 7 15 11 7 16 16 8 17 19 8 18 6 13 19 8 7 20 7 6 21 8 9 22 2 12 23 6 7 24 3 8 25 4 7 26 4 2 27 2 5 28 2 1 29 3 4 30 1 4 31 0 3 32 0 3 33 0 1 34 0 1 35 0 1 36 0 1 37 0 1 41 1 0 Model 1: lm(formula = death ~ APACHEII) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.002446 0.050708 -0.048 0.962 APACHEII 0.025070 0.003009 8.331 9.66e-16 *** Model 2: glm(formula = death ~ APACHEII, family = "binomial") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.32521 0.27790 -8.367 < 2e-16 *** APACHEII 0.11673 0.01608 7.260 3.87e-13 ***
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help