Midterm 2 Practice Test Solutions

.pdf

School

Brooklyn College, CUNY *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by DoctorKnowledge11847

STAT207 Midterm 2 Lab Section Start Time: ________________________________ Last Name: ________________________________ First Name :_____________________________ Academic Integrity I hereby state that I have not communicated with or gained information in any way from my classmates during this exam, and that all work is my own. I am aware of the course academic dishonesty policies written in the syllabus, which indicates that evidence of cheating on the exam results in an automatic F in the course. Signature: ___________________________________________ Test Instructions 1. You have 80 minutes to complete this exam. 2. Show all your work on the open-ended exam questions in order to get full credit. No credit will be given for open ended questions where no work is shown, even if the answer is correct. 3. On this exam you are allowed: a. A calculator b. A cheatsheet with notes on one-side of a 8.5” by 11” sheet of paper. i. Must be handwritten ii. In your own handwriting 4. You are not allowed to use a cellphone, even if you intend to use it as a calculator or to check the time. 5. If you are completely stumped, write as much as you can about what you do know about what the problem might involve.

Part 1 – Linear Regression Application Basic Dataset Information In the first part of this exam, we will explore and conduct analyses on the following dataset which is a random sample of 400 U.S. counties. This dataset contains the following information about each county. • Poverty rate • Homeownership rate • Percent of housing units in multi-unit structures • Unemployment_rate • Metro: whether the county contains a metropolitan area (yes, no) • Median_edu: median education level (hs_diploma, some_college, bachelors, below_hs) • Per_capita_income • Median_hh_income: median household income More Dataset Information This dataset has no missing values. Main Research Goal The main research goal that we will pursue in this exam will be to build a predictive model that effectively predicts one of our selected numerical variables for other U.S. counties not in this dataset given some combination of the remaining variables in the dataset (not name or state). Secondary Research Goal Ideally , the model that we select would also be interpretable. Specifically, we would ideally like for this model to also accurately reflect the relationship that exists between the chosen explanatory variables and the response variable. Train-Test-Split We take this dataset and randomly split it into a training dataset and a test dataset. The test dataset is comprised of 20% of the observations.

1. Variable Transformations First, we’d like to build a simple linear regression that predicts poverty with median_hh_income. The plot below shows the relationship between these two variables. 1.1. Linearity Assumption We then fit the following three linear regression models that involve these two variables. A. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � = 35.25 − 0.0004 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 B. log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 3.95 − 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 C. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 � = 997.13 − 0.0136 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 Match the model to the most likely corresponding fitted values vs. residuals plot. Explanation not needed, but may help with partial credit if you are wrong.

 A-3  B-2  C-1 Explanation: The scatterplot to the right indicates that there is not a linear relationship between median_hh_income and poverty in this dataset. Thus, we would not expect the non-transformed linear regression model A to meet the linearity condition. Alternatively, though if we transform poverty with 𝑙𝑙𝑚𝑚 ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) , then this has the effect of “squashing” higher poverty values more than lower poverty values. This can have the effect of “straightening out” this nonlinear relationship to the one that we see on the right. Thus, we would expect the log-transformed model B to have a linearity assumption that is closer to being met (like fitted values vs. residuals plot #2). On the other hand, if we transform poverty with 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 , then the higher poverty values will be increased more than the lower poverty values. This can have the effect of magnifying the nonlinear relationship to the one that we would see on the right. Thus, we might expect the poverty^2 model C to have a linearity assumption that is the least close to being met (like fitted values vs. residuals plot #1).

1.2. Suitable Models: Which of these linear regression models is the most suitable type of model for modeling the relationship between poverty and median_hh_income? Model B : 𝑙𝑙𝑝𝑝𝑙𝑙 ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 3.95 − 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 This is because it’s linearity assumption was the closest to being met out of the three models that we tried. 1.3. Model Prediction: Use the model below to predict the poverty rate of a county with a median household income of $70,000. Log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 3.95 − 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 3.95 − 0.00002498(70,000) = 2.2014 Don’t’ forget to exponentiate both sides to get the predicted poverty as opposed to the predicted log(poverty)! 𝑝𝑝 log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 𝑝𝑝 2 . 2014 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � = 9.038 percent 1.4. Model Slope Interpretation: Put the slope of this model into words. Be sure to not use misleading language! log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 3.95 − 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 If we were to increase the median household income by $1, we would expect, on average the log(poverty) to decrease by 0.00002498.

2. Slope Interpretations Next, suppose we fit the following linear regression model that predicts unemployment_rate given: • Per_capita_income • Median_hh_income • Homeownership rate Here is some information about these four numerical variables in the training dataset. Correlations Summary Statistics

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

SEE MORE TEXTBOOKS

Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

SEE MORE TEXTBOOKS