Assignment 5 F23

.pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

231

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

12

Uploaded by zhangjames617

Report
1 STAT 231 Fall 2023 Assignment 5 Assignment 5 is due on Tuesday November 30 at 11:00am Eastern Time. Your assignment must be typed. You may create your document in Word, Google Docs, LaTeX or any other word processor. The requirement to type your assignment is to facilitate the grading so that the marked assignments can be returned to you in a timely fashion. It is also useful for you to gain some experience in creating a document containing mathematical expressions. Two documents have been posted in the Assignment 1 folder in LEARN on how to use the equation editor in Word. If you wish to use LaTeX then you may find Overleaf particularly useful for this. See https://www.overleaf.com/edu/uwaterloo Upload your assignment to Crowdmark as a pdf file. You can upload your assignment as one document or individually for each problem. If you upload one document then you must drag and drop the pages for each problem to the appropriate question as indicated in Crowdmark. This is extremely important since dealing with assignments which are left as one document requires extra time and effort by the markers. Be sure to upload your assignment well in advance of the due time since uploading an assignment of many pages to Crowdmark requires time. In addition to submitting your assignment component to Crowdmark, you must submit your assignment as a single pdf document to the Assignment 5 Dropbox in LEARN to facilitate the running of your assignment through plagiarism detection software. Your submissions to Crowdmark and the LEARN Dropbox must be identical. Please do not include these two pages of information or any instructions given for each problem in your assignment submission to Crowdmark and the LEARN Dropbox. Doing so means that your assignment is flagged by the Turnitin software used for checking plagiarism. Many problems on this assignment indicate that your answers must be given in sentences. This course emphasizes learning to communicate statistical concepts in sentences. In some of the problems on this assignment you are asked to use R. Only the answers/results you obtain using R must be included in your Crowdmark pdf submission. Your R code must be uploaded as an R file to the Assignment 5 R Code Dropbox in LEARN . Effectively commenting your code is a important skill to develop. Markers will review your file and run it to verify the answers match those in your Crowdmark submission and that the code runs without error. Your code must correctly find the answers needed to get the marks associated with the problems. Good commenting will allow the marker to more easily assign you a full score when reviewing your file. Please ensure your code submitted in the R file is well commented. Penalties: (1) Answers which are not typed will not be marked and will receive a mark of zero. (2) An assignment which is uploaded late to Crowdmark will be assigned a penalty of 5% per hour. (3) An assignment which is left as a single document and not uploaded to the appropriate places in Crowdmark will be assigned a 10% overall penalty. (4) An assignment which is submitted late to the Assignment 2 Dropbox in LEARN will be assigned a 5% overall penalty.
2 (5) If the file of R code is submitted late to the Assignment 2 R Code Dropbox in LEARN, then the assignment will be assigned a 5% overall penalty. (6) Answers which are required to be written in sentences but are not in sentences will be assigned a 5% overall penalty. (7) Assignments which include R code in the Crowdmark submission will be assigned a 5% overall penalty. Checklist to complete for this assignment: Upload the pdf of your assignment to Crowdmark by the deadline. Upload the pdf file of your assignment to the Assignment 5 Dropbox in LEARN by the deadline. Upload the R file of your R code to the Assignment 5 R Code Dropbox in LEARN by the deadline. This assignment is based on the material in Chapters 1‐4, Sections 5.1‐5.3, and Sections 6.1‐6.4 of the STAT 231 Course Notes. Coursework 5 Assignment Component Learning Outcomes Here are the intended learning outcomes for this assignment component. Try to identify the learning outcomes which are achieved by each of the given problems. Enjoy 😊 Fit and analyse a simple linear regression model. Interpret diagnostic plots and make recommendations on how model issues can be fixed. Perform two‐sample testing for independent samples and make conclusions based on the observed data.
3 Problem 1: Regression models The purpose of this problem is to look at fitting regression models using the shiny app https://shiny.math.uwaterloo.ca/sas/stat231/regressionmodels/ The app explores regression models by generating two variates ‐ x and y ‐ according to an underlying relationship. In the first setting ('Random') there is no relationship between x and y. In the second setting ('Linear') there is a linear relationship between x and y. In the third setting ('Quadratic') there is a quadratic relationship between x and y. If you select Linear or Quadratic you can specify the true parameters that relate x and y in the models. The data are generated as follows. First, a sample of the specified size is generated for x from a Uniform distribution. The y values are then generated depending on the true relationship and the specified parameters. For example, if you choose 'Linear' for the true relationship, and set the y intercept to 1 and the coefficient of x to 2, then the y values are observations from a Gaussian distribution with mean equal to 1 + 2x and standard deviation equal to 𝝈 . The variance of the error term can also be controlled with a slider, and you can specify whether the error is homoscedastic (constant variance) or heteroscedastic (non‐constant variance). Once the true relationship is specified, you can then choose whether to fit a linear or quadratic model to the data, and view the resulting model statistics as well as look at various residual plots. Try experimenting with the different inputs and see what happens in the results! Several questions below ask you to include the plots generated by the Shiny app. You can do this by either right‐clicking the graph and selecting 'save as', or by taking a screenshot. Include only the plot in your screenshot. All written answers must be in full sentences. Please do not include any instructions in your assignment submission to Crowdmark or the LEARN Dropbox. (a) On the Shiny app select quadratic as the true relationship, 125 as the sample size, ‐2 as the y intercept, 0 as the coefficient of x, ‐1 as the coefficient of x 2 , 0.6 as the standard deviation, homoscedastic as the variability behaviour, and linear as the model to fit. Increase the sample size up and down from 125 and see what happens. (Note: you do not need to write anything for this, just observe what changes in the plots.) Now, set the sample size back to 125 and answer the following questions: (i) Hit the resample! button serval times until you view a scatterplot which you think clearly displays a quadratic relationship between the explanatory variate x and the response variate y. Insert the following 3 plots in your assignment: (1) the scatterplot, (2) the plot of the standardized residuals versus x, and (3) the qqplot of the standard residuals.
4 (ii) The numerical output provided by the Shiny app just below the plots under the title “Coefficients” assumes the model 𝑌 ~ 𝐺ሺ 𝛼 ൅ 𝛽 𝑥 , 𝜎ሻ , 𝑖 ൌ 1 ,2 , , 𝑛 independently where the 𝑥 ′𝑠 are assumed to be known constants Use this output to test the hypothesis of no relationship between the response variate and the explanatory variate (H 0 : β = 0). Be sure to include the value of the t test statistic 𝛽 െ 0ห 𝑠 /ඥ 𝑆 ௫௫ the degrees of freedom associated with the test statistic, the p‐value, and the conclusion. Use Table 5.1 in the Course Notes to make your conclusion. (iii) Consider the statement: "There is no evidence of a relationship between the explanatory variate x and the response variate y for these data." Discuss whether there is evidence to support this statement. Your answer should only discuss the evidence presented in your answers to parts (i) and (ii). (iv) Now select quadratic as the model to fit. (Note: you should only change the model to fit to quadratic. You should not change the true relationship.) Do not change any other inputs. Insert the following 3 plots in your assignment: (1) the scatterplot, (2) the plot of the standardized residuals versus x, and (3) the qqplot of the standard residuals. (v) By comparing the plots you included as your answers to parts (i) and (iv), discuss whether the linear or quadratic model is a more appropriate fit for your data. (b) On the Shiny app select linear as the true relationship, 160 as the sample size, ‐2 as the y intercept, 2 as the coefficient of x, 0.5 as the standard deviation, heteroscedastic as the variability behaviour, linear as the model to fit, and standardized residuals versus fitted as the residual plot. Move the standard deviation slider up and down and see what happens. (Note: you do not need to write anything for this, just observe what changes in the plots.) Now, set the standard deviation back to 0.5 and answer the following questions: (i) Hit the resample! button serval times until you view a plot which you think clearly displays the heteroscedastic behaviour of the variability. In your assignment insert this scatterplot and the corresponding residual plot which you think most clearly illustrates heteroscedasticity. (ii) Using words a non‐statistician would understand, describe how the plots can be used as evidence that there is heteroscedasticity in your data.
5 (iii) Leaving all other selections on the app at the same values as in (b), select qqplot of standardized residuals as the residual plot. Hit the resample! button serval times. Choose one qqplot which you think illustrates heteroscedasticity and insert it in your assignment. (iv) Explain how heteroscedasticity in the data is illustrated in the qqplot of standardized residuals which you chose in (b)(iii).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help