HW2 LR assignment

docx

School

New York University *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

5

Uploaded by DeaconBoulderAlpaca37

Report
HW2 2024-02-10 Part 1: The cheddar cheese study In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters. The cheddar dataset has 30 observations on the following 4 variables. taste: a subjective taste score Acetic: concentration of acetic acid (log scale) H2S: concentration of hydrogen sulfice (log scale) Lactic: concentration of lactic acid Use the following statement to access the data: data(“cheddar”, package = “faraway”) Question 1.1: Show descriptive statistics for each of the variables rm ( list = ls ()) # read data here data ( "cheddar" , package = "faraway" ) # Note: using summary() for descriptive statistics is sufficient for this part # Enter your code below summary (cheddar) ## taste Acetic H2S Lactic ## Min. : 0.70 Min. :4.477 Min. : 2.996 Min. :0.860 ## 1st Qu.:13.55 1st Qu.:5.237 1st Qu.: 3.978 1st Qu.:1.250 ## Median :20.95 Median :5.425 Median : 5.329 Median :1.450 ## Mean :24.53 Mean :5.498 Mean : 5.942 Mean :1.442 ## 3rd Qu.:36.70 3rd Qu.:5.883 3rd Qu.: 7.575 3rd Qu.:1.667 ## Max. :57.20 Max. :6.458 Max. :10.199 Max. :2.010 Question 1.2: Fit a regression model with taste as the response and the three chemical contents as predictors. Identify the predictors that are statistically significant at the 5% level. # Enter your code below
lmod <- lm (taste ~ Acetic + H2S + Lactic , data = cheddar) summary (lmod) ## ## Call: ## lm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar) ## ## Residuals: ## Min 1Q Median 3Q Max ## -17.390 -6.612 -1.009 4.908 25.449 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -28.8768 19.7354 -1.463 0.15540 ## Acetic 0.3277 4.4598 0.073 0.94198 ## H2S 3.9118 1.2484 3.133 0.00425 ** ## Lactic 19.6705 8.6291 2.280 0.03108 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.13 on 26 degrees of freedom ## Multiple R-squared: 0.6518, Adjusted R-squared: 0.6116 ## F-statistic: 16.22 on 3 and 26 DF, p-value: 3.81e-06 The findings show that, at the 5% level, lactic acid and hydrogen sulfide are statistically significant predictors of taste of the dataset cheddar, whereas acetic acid concentration is not at all significant. Question 1.3: Calculate the p-values of the three predictors using the anova function. # Enter your code below model_NoAce <- lm (taste ~ H2S + Lactic, data = cheddar) anova (model_NoAce, lmod) ## Analysis of Variance Table ## ## Model 1: taste ~ H2S + Lactic ## Model 2: taste ~ Acetic + H2S + Lactic ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 27 2669.0 ## 2 26 2668.4 1 0.55427 0.0054 0.942 model_NoLac <- lm (taste ~ Acetic + H2S, data = cheddar) anova (model_NoLac, lmod) ## Analysis of Variance Table ## ## Model 1: taste ~ Acetic + H2S ## Model 2: taste ~ Acetic + H2S + Lactic
## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 27 3201.7 ## 2 26 2668.4 1 533.32 5.1964 0.03108 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 model_NoH2S <- lm (taste ~ Acetic + Lactic, data = cheddar) anova (model_NoH2S, lmod) ## Analysis of Variance Table ## ## Model 1: taste ~ Acetic + Lactic ## Model 2: taste ~ Acetic + H2S + Lactic ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 27 3676.1 ## 2 26 2668.4 1 1007.7 9.8182 0.004247 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Question 1.4: Use the anova function to calculate the significance of the full model. # Enter your code below nullmod <- lm (taste ~ 1 , cheddar) anova (nullmod, lmod) ## Analysis of Variance Table ## ## Model 1: taste ~ 1 ## Model 2: taste ~ Acetic + H2S + Lactic ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 29 7662.9 ## 2 26 2668.4 3 4994.5 16.221 3.81e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Part 2: Study of teenage gambling in Britain The teengamb dataset contains a survey that was conducted to study teenage gambling in Britain. The dataset has 47 observations and 5 variables: sex: 0 = male, 1 = female status: Socioeconomic status score based on parents’ occupation income: income in pounds per week verbal: verbal score in words out of 12 correctly defined gamble: expenditure on gambling in pounds per year Use the following statement to access the data: data(“teengamb”, package = “faraway”)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 2.1: fit a model with gamble as the response and the other variables as predictors. Which variables are statistically significant at the 5% level? #read data here data ( "teengamb" , package = "faraway" ) # Enter your code below model <- lm (gamble ~ sex + status + income + verbal, data = teengamb) summary (model) ## ## Call: ## lm(formula = gamble ~ sex + status + income + verbal, data = teengamb) ## ## Residuals: ## Min 1Q Median 3Q Max ## -51.082 -11.320 -1.451 9.452 94.252 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 22.55565 17.19680 1.312 0.1968 ## sex -22.11833 8.21111 -2.694 0.0101 * ## status 0.05223 0.28111 0.186 0.8535 ## income 4.96198 1.02539 4.839 1.79e-05 *** ## verbal -2.95949 2.17215 -1.362 0.1803 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 22.69 on 42 degrees of freedom ## Multiple R-squared: 0.5267, Adjusted R-squared: 0.4816 ## F-statistic: 11.69 on 4 and 42 DF, p-value: 1.815e-06 The findings show that, at the 5% level, sex and income are statistically significant predictors of the gamble of teengamb, whereas status and verbal are not at all significant Question 2.2: Provide interpretation for the significant coefficients For the variable SEX, males are coded as 0 and female as 1 as it is a categorical variable, so according to the data, males tend to spend more on gambling, therefore, when all the other variables are held constant, every change in the gender value (from 0 to 1) there is a decrease of 22.11 in the gambling pounds. For the variable income, it can be said that when all other variables are held constant, for each unit rise in income, gambling is increased by 4.96 pounds (as coefficient is positive)
Question 2.3: Use the confint function to produce 95% confidence intervals for the coefficients based on the same model. Can you deduce which coefficients are significant at the level of 5% based on the intervals? # Enter your code below lmod <- confint (model, level = 0.9 ) print (lmod) ## 5 % 95 % ## (Intercept) -6.3685535 51.4798547 ## sex -35.9290335 -8.3076267 ## status -0.4205824 0.5250501 ## income 3.2373182 6.6866402 ## verbal -6.6129469 0.6939599 Based on these confidence intervals, we deduce that “sex” and “income” coefficients are significant at the 5% level, while “status” and “verbal” coefficients are not significant. Question 2.4: Use the anova function to test the null hypothesis that the coefficients of sex and income are both 0 using the same model # Enter your code below reduced_model <- lm (gamble ~ status + verbal, data = teengamb) hyp_nul <- anova (reduced_model, model) hyp_nul ## Analysis of Variance Table ## ## Model 1: gamble ~ status + verbal ## Model 2: gamble ~ sex + status + income + verbal ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 44 43195 ## 2 42 21624 2 21571 20.949 4.892e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Based on ANOVA, we have strong evidence to reject the null hypothesis that the coefficients of both sex and income are 0 because of the significant p value and smaller RSS