MIE1626 Midterm Practice Problems with Soln

.pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

1626

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by MagistrateThunderPolarBear15

Report
Practice Problems MIE1626 Page 1 of 6 QUESTION 1 [2 marks]: In the expression Sales ≈ f(TV, Radio, Newspaper), "Sales" is the: A) Response B) Training Data C) Independent Variable D) Feature QUESTION 2 [2 marks]: In a predictive modeling project using regression, you fit a linear model to your data set. Which of the following is most likely true if you fit a quadratic model to the data set? A) Using the Quadratic Model will decrease your Irreducible Error. B) Using the Quadratic Model will decrease the Bias of your model. C) Using the Quadratic Model will decrease the Variance of your model D) Using the Quadratic Model will decrease your Reducible Error QUESTION 3 [2 marks]: One way of carrying out the bootstrap is to average equally over all possible bootstrap samples from the original data set (where two bootstrap data sets are different if they have the same observations but in different order). Unlike the usual implementation of the bootstrap, this method has the advantage of not introducing extra noise due to resampling randomly. To carry out this implementation on a data set with n data points, how many bootstrap data sets would we need to average over? A) 2 𝑛 B) 𝑛 2 C) 𝑛 𝑛 D) 𝑛! QUESTION 4 [2 marks]: Which of the following statements is more accurate about classification methods: Logistic Regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and naive Bayes? A) Logistic regression is not a suitable method when the classes are well-separated. B) LDA is useful when n is large and for problems with more than 2 classes. C) Assuming Gaussian distributions in each class, QDA is less flexible than naive Bayes. D) Naive bayes is most useful when the number of features and samples are roughly the same.
Practice Problems MIE1626 Page 2 of 6 Answers for Q1-4 1 2 3 4 A B C A QUESTION 5 [18 marks]: For predicting p, the probability of credit default, you have used a logistic regression model with variables X 1 = credit score and X 2 = credit card balance. Using historical data with class labels, you have fitted the model Logit(p) ≈ 𝛽 ̂ 0 + 𝛽 ̂ 1 ? 1 + 𝛽 ̂ 2 ? 2 and obtained the estimated coefficients 𝛽 0 ̂ = −50 , 𝛽 1 ̂ = −1 , and 𝛽 2 ̂ = 0.2 . Part (a). [1 mark]: Explain in plain English what the estimated intercept means and provide a numerical example for its role in the model. Answer: the intercept determines the prediction for a sample where X 1 and X 2 are both 0. Accordingly p= e^(-50)/(1+e^(-50))= 1.9e-22 which is a very small number indicating the very small estimated probability of default for a person with a credit score of 0 and a balance of 0. So, in plain English, our model predicts that a person with a credit score of 0 and a balance of 0 is very unlikely to have a credit default. Part (b). [2 marks]: Explain what 𝛽 2 ̂ means and provide a numerical example for how it impacts logit(p). Answer: 𝛽 2 ̂ is the estimated coefficient for credit card balance in the logistic regression model. A one unit increase in balance results in an increase of size 𝛽 2 ̂ = 0.2 in the logit(p) if the other variable (credit score) remains unchanged. Logit(p) is the logarithm of the odds p/(1-p) where p is the probability of default. Logit(p)=Ln (p/(1-p)) = 𝛽 ̂ 0 + 𝛽 ̂ 1 ? 1 + 𝛽 ̂ 2 ? 2 Part (c). [1 mark]: Provide a numerical example for how 𝛽 2 ̂ impacts the odds of default. Odds = p/(1-p) = 𝑒 (𝛽 ̂ 0 +𝛽 ̂ 1 𝑥 1 +𝛽 ̂ 2 𝑥 2 )
Practice Problems MIE1626 Page 3 of 6 Following the example in the previous part, a one unit increase in balance (while the other variable remains unchanged) results in the odds p/(1-p) increasing by a factor of exp( 𝛽 2 ̂ )=e^0.2 = 1.22 which is a 22% increase in the odds of credit default. Note that odds of credit default is p/(1-p) which is different from the probability of credit default (represented as p). Part (d). [1 mark]: Estimate the probability of credit default for Bob who has a credit score of 70 and a credit card balance of 610. p= e^(-50-1*70+0.2*610)/(1+e^(-50-1*70+0.2*610))=0.88 Part (e). [1 mark]: For having a credit default risk of 50% what should Bob’s credit card balance be? e^(-50-1*70+0.2*x)/(1+e^(-50-1*70+0.2*x))=0.5 -> 2*e^(-50-1*70+0.2*x) = 1+e^(-50-1*70+0.2*x) -> e^(-50-1*70+0.2*x)=1 -> -50-1*70+0.2*x=0 -> x=600 Part (f). [4 marks]: To use the logistic model as a classifier for detecting defaulters, we use the classification threshold of 𝑝̂ = 0.5. Calculate the confusion matrix for the following test dataset and specify the values for TP, TN, FP, and FN. X 1 = credit score X 2 = credit card balance Y (class labels) 70 610 1 70 700 1 70 800 1 70 500 0 70 400 1 60 600 0 50 600 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help