Midterm 2 Practice Test Solutions

.pdf

School

Brooklyn College, CUNY *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

19

Uploaded by DoctorKnowledge11847

Report
STAT207 Midterm 2 Lab Section Start Time: ________________________________ Last Name: ________________________________ First Name :_____________________________ Academic Integrity I hereby state that I have not communicated with or gained information in any way from my classmates during this exam, and that all work is my own. I am aware of the course academic dishonesty policies written in the syllabus, which indicates that evidence of cheating on the exam results in an automatic F in the course. Signature: ___________________________________________ Test Instructions 1. You have 80 minutes to complete this exam. 2. Show all your work on the open-ended exam questions in order to get full credit. No credit will be given for open ended questions where no work is shown, even if the answer is correct. 3. On this exam you are allowed: a. A calculator b. A cheatsheet with notes on one-side of a 8.5” by 11” sheet of paper. i. Must be handwritten ii. In your own handwriting 4. You are not allowed to use a cellphone, even if you intend to use it as a calculator or to check the time. 5. If you are completely stumped, write as much as you can about what you do know about what the problem might involve.
Part 1 – Linear Regression Application Basic Dataset Information In the first part of this exam, we will explore and conduct analyses on the following dataset which is a random sample of 400 U.S. counties. This dataset contains the following information about each county. Poverty rate Homeownership rate Percent of housing units in multi-unit structures Unemployment_rate Metro: whether the county contains a metropolitan area (yes, no) Median_edu: median education level (hs_diploma, some_college, bachelors, below_hs) Per_capita_income Median_hh_income: median household income More Dataset Information This dataset has no missing values. Main Research Goal The main research goal that we will pursue in this exam will be to build a predictive model that effectively predicts one of our selected numerical variables for other U.S. counties not in this dataset given some combination of the remaining variables in the dataset (not name or state). Secondary Research Goal Ideally , the model that we select would also be interpretable. Specifically, we would ideally like for this model to also accurately reflect the relationship that exists between the chosen explanatory variables and the response variable. Train-Test-Split We take this dataset and randomly split it into a training dataset and a test dataset. The test dataset is comprised of 20% of the observations.
1. Variable Transformations First, we’d like to build a simple linear regression that predicts poverty with median_hh_income. The plot below shows the relationship between these two variables. 1.1. Linearity Assumption We then fit the following three linear regression models that involve these two variables. A. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 35.25 0.0004 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 B. log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 C. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 = 997.13 0.0136 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 Match the model to the most likely corresponding fitted values vs. residuals plot. Explanation not needed, but may help with partial credit if you are wrong.
A-3 B-2 C-1 Explanation: The scatterplot to the right indicates that there is not a linear relationship between median_hh_income and poverty in this dataset. Thus, we would not expect the non-transformed linear regression model A to meet the linearity condition. Alternatively, though if we transform poverty with 𝑙𝑙𝑚𝑚 ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) , then this has the effect of “squashing” higher poverty values more than lower poverty values. This can have the effect of “straightening out” this nonlinear relationship to the one that we see on the right. Thus, we would expect the log-transformed model B to have a linearity assumption that is closer to being met (like fitted values vs. residuals plot #2). On the other hand, if we transform poverty with 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 , then the higher poverty values will be increased more than the lower poverty values. This can have the effect of magnifying the nonlinear relationship to the one that we would see on the right. Thus, we might expect the poverty^2 model C to have a linearity assumption that is the least close to being met (like fitted values vs. residuals plot #1).
1.2. Suitable Models: Which of these linear regression models is the most suitable type of model for modeling the relationship between poverty and median_hh_income? Model B : 𝑙𝑙𝑝𝑝𝑙𝑙 ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 This is because it’s linearity assumption was the closest to being met out of the three models that we tried. 1.3. Model Prediction: Use the model below to predict the poverty rate of a county with a median household income of $70,000. Log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498(70,000) = 2.2014 Don’t’ forget to exponentiate both sides to get the predicted poverty as opposed to the predicted log(poverty)! 𝑝𝑝 log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 𝑝𝑝 2 . 2014 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 9.038 percent 1.4. Model Slope Interpretation: Put the slope of this model into words. Be sure to not use misleading language! log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 If we were to increase the median household income by $1, we would expect, on average the log(poverty) to decrease by 0.00002498.
2. Slope Interpretations Next, suppose we fit the following linear regression model that predicts unemployment_rate given: Per_capita_income Median_hh_income Homeownership rate Here is some information about these four numerical variables in the training dataset. Correlations Summary Statistics
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help