CIS_5200_Fall_2023_Final_Practice_Problems

.pdf

School

University of Pennsylvania *

*We aren’t endorsed by this school

Course

5200

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

5

Uploaded by HighnessArt13215

Report
CIS 5200: Machine Learning Fall 2023 Final Exam Sample Questions University of Pennsylvania This is a brief document containing sample questions from the second half of the course . For questions on topics from the first half of the course, see the practice midterm. SAMPLE T/F Questions [?? points] For each question below, identify if the statements made are True or False. If you believe the statement is false, justify your answers. Correct “True” questions are worth 1 point, while correct “False” answers are worth two points – one for the correct answer, and one for a correct justification. (a) (T/F) It’s okay to evaluate test error multiple times on the same test set, so long as the model was never trained on that test set. (b) (T/F) Boosting reduces bias. (c) (T/F) When boosting the squared loss, examples that the current ensemble gets correct are removed from the training set. (d) (T/F) True positive rate, recall, and specificity all measure the same thing. (e) (T/F) A model that achieves 99% test accuracy is always a good model. (f) (T/F) In Adaboost, the weight of any given training example is strictly decreasing as you boost more. (g) (T/F) When bagging, it is possible for a training example to appear more than once in the same bootstrap sample or “bag”. 1
CIS 5200: Machine Learning Midterm Exam 2 (h) (T/F) In general, boosted decision stumps (decision trees with a single split) are less likely to overfit than single, full-depth decision trees. Name : Penn ID :
3 SAMPLE SHORT ANSWER QUESTIONS (a) [2 points] Suppose you are given the following dataset, along with an ensemble H ( x ) that makes the following predictions: x y H ( x ) 1 5 5.5 2 6 5.5 3 7 7.5 4 8 7.5 You are training a gradient boosted regression tree, boosting the squared loss. What dataset would you train your next weak learner on? List both the feature values and labels you would call the weak learner algorithm with. (b) [2 points] Suppose you determine that 0 . 99 n of your n training labels are +1. Would you expect accuracy to be a good measure of performance in this scenario? If yes, why? If not, what are some alternative performance metrics that you might use instead? (c) [2 points] Suppose you run PCA on a training dataset { x 1 , ..., x n } . After computing all eigen- values of the sample covariance matrix λ 1 , ..., λ n , you observe that: λ 1 + λ 2 λ 1 + · · · + λ n = 0 . 98 What does this mean about your data? (d) [2 points] Suppose you are training a decision tree on a binary classification dataset with 10 traning examples. You are considering a feature split that splits your dataset into the following two subsets: (a) Subset 1: 6 instances (4 positive labels, 2 negative labels) (b) Subset 2: 4 instances (1 positive label, 3 negative labels) Name : Penn ID :
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help