asg4_fall2023

.pdf

School

University of Alberta *

*We aren’t endorsed by this school

Course

252

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

5

Uploaded by PresidentOtter5650

1 ASSIGNMENT 4 MULTIPLE LINEAR REGRESSION In this assignment you will use multiple regression tools to examine the relationship between extinction time of land-bird species and average number of nesting pair. You will consider a few different models to estimate the mean of the response variable as a function of the explanatory variables, and answer questions of interest. You will evaluate the evidence of an association between species size or migratory status and extinction time after accounting for number of nesting pairs. In addition, you will test the regression model assumptions and apply diagnostic tools in SPSS. Extinction Time of Land-Bird Species A group of scientists expects that species with larger numbers of nesting pairs will tend to remain longer before becoming extinct. In order to verify the claim, they obtained measurements on breeding pairs of land-bird species collected from 16 islands around Britain over the course of several decades. (S.L. Pimm, H.L. Jones, and J. Diamond, “On the Risk of Extinction,” American Naturalist 132 (1988): 757 -85.). The response variable is time of extinction on the island where the birds appeared and the explanatory variables number of nesting pairs, the size of the species and the migratory status of the species. The data are available in the SPSS file located in the STAT 252 Laboratories section on eClass. In order to download the data for the lab, click on the link Data for Lab 4 and follow the instructions. You do not need to copy the data into your lab report. The following is a description of the variables in the data file: Column Variable Name Description of Variable 1 Species Name of species 2 Time Average time of extinction (years) on those islands where species appeared 3 Pairs The average number of nesting pairs (the average, over all islands where the birds appeared, of the number of nesting pairs per year) 4 Size Size of the species (categorized as L or S L for Large and S for small) 5 Status Migratory status of species (categorized as R or M R for resident and M for migrant) Use the data set to answer the following questions: 1. First you will use scatterplot tool in SPSS to see if there is any association between time of extinction and number of nesting pairs. a) Obtain a scatterplot of time of extinction versus number of pairs using different markers for each species size. Paste the plot into your report. Comment briefly on the pattern of association between time of extinction and number of pairs. Does the data support the notion that species with larger numbers of nesting pairs will tend to remain longer before becoming extinct? Explain briefly. Is the pattern and strength of the association the same for each species size? Are there any observable outliers? b) Regardless of species size and migratory status does the data suggest the variability of time of extinction is roughly constant across values of pairs? What would you conclude about the assumption of constant variability?
2 2. Use the Correlate tool in the SPSS Analyze menu to obtain the correlation coefficient between time to extinction and number of pairs. Find the correlation coefficient for all observations combined. Then use the split file feature to find the correlation coefficient for each of the size groups. Paste the output into your report. Comment on its sign and magnitude of the coefficients. Are the signs and magnitudes of the associations consistent with your observations from the scatterplot in Question 1 part (a)? 3. Create a new variable called lntime, which is computed as the natural logarithm of time of extinction (ln(time)). Use this as your response variable for the rest of the questions. a) Obtain and paste a scatterplot of lntime versus number of pairs (there is no need to set different markers for different groups here, and make sure to remove the split file feature you used in question 2). Paste the plot into your report. Comment on the pattern of association between lntime and number of pairs. Moreover, compare the association on the log-transformed and the original scale of measurement. Does it appear the log transformation has helped with the assumptions of constant variability? b) Find the correlation coefficient between the lntime and number of pairs for all observations together. Paste the output into your report. Comment on the magnitude of this correlation making specific comparison to the untransformed scale. 4. Now you will use the regression tool in SPSS to study the effects of number of pairs, species size and migratory status on lntime. In order to use Size and Status as explanatory variables you will need to create indicator variables as follows: sizeN = 0 if small or 1 if large statusN = 0 if migrant or 1 if resident Paste relevant SPSS output into your report as needed. a) Define a multiple linear regression model with the log-transformed time of extinction as the response and number of pairs, size of species and migratory status of species as explanatory variables. b) What is the estimated regression model of lntime on number of pairs, size of species and migratory status of species? Paste the relevant output needed for this part. c) Is the regression model significant in any way? State this question as null and alternative hypothesis in terms of the regression coefficients, report the value of the test statistic and its P- value from the SPSS output. State the distribution of the test statistic under the null hypothesis. What is your conclusion at a significance level of 0.01? Paste the relevant output needed for this part. d) What is the percentage of the variation in ln(time) explained by the explanatory variables in the model. Paste the relevant output needed for this part. e) Is the number of nesting pairs significant in estimating mean lntime after accounting for species size and migratory status? State this question as null and alternative hypothesis in terms of the regression coefficients, report the value of the test statistic and its P-value from the SPSS output. State the distribution of the test statistic under the null hypothesis. What is your conclusion at a significance level of 0.01? No additional output needs to be pasted here. Just refer to the appropriate output you have already pasted.
3 5. Now you will investigate the adequacy of the model you fit in Question 4. a) Obtain a plot of standardized residuals (ZRESID) versus standardized predicted values (ZPRED). Paste the plot into your report. Is there evidence that the constant variance assumption is violated? Any outliers? Explain briefly. b) Obtain a normal probability plot of standardized residuals. Paste the plot into your report. Is there any evidence that the normality assumption may be grossly violated? c) Are there any outliers, influential values, and/or leverage values? Obtain some case statistics (in the Save option in the Linear Regression function). Choose studentized residuals, Cook’s distances, and leverage values for the 61 species. Examine these statistics carefully for each species and identify outliers and potential influential cases that may be present in this data set. Base your comments on the following notes: Studentized residuals over 2 in absolute value may be considered an outlier. Cook s distance close to or larger than 1 may be considered influential. Leverage values over 2(p+1)/n are considered high leverage values, where p is the number of explanatory variable terms in a regression model (so p+1 is the number of regression coefficients in the model). 6. Refer to the model defined in question 4 for this question. After accounting for number of nesting pairs, do either species size or migratory status have any significant effect on lntime of extinction? State this question as a single test with your null and alternative hypothesis in terms of the regression coefficients. Paste the additional ANOVA table that is required to answer this question. State the sum- of-squares residuals and degrees of freedom for both models (under the null hypothesis and alternative hypothesis). Show calculations of the test-statistic. State the distribution of the test statistic under the null hypothesis and determine the corresponding p-value range using the tables. State your conclusion at a significance level of 0.05. 7. Refer to the model defined in question 4 for this question. Suppose a migrant species has a large size and has 2.83 number of pairs on average. Answer the following questions on the original scale . a) What is the predicted value of the time of extinction for this species? Use the model to estimate the response. Then back-transform to the original scale (in years). b) Obtain the 95% confidence interval for the mean time of extinction (in years on the original scale) of this species. In addition, obtain the 95% prediction interval for a species with these characteristics (in years on the original scale). You can obtain these confidence intervals in the Save option in the regression function (choose the prediction interval boxes). Then you can back- transform the intervals. Which of the two intervals is wider? 8. Does the effect of the number of nesting pairs depend on the species size, after accounting migratory status? Investigate this by creating the appropriate interaction variable and add this variable to the variables in the model reported in Question 4. Write out the new model. Paste any new output needed to answer this question. State this question as null and alternative hypothesis in terms of the regression coefficients, report the value of the test statistic and its P-value from the SPSS output. State the distribution of the test statistic under the null hypothesis. What is your conclusion at a significance level of 0.05?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help