CIS_5200_Fall_2023_Final_Practice_Problems

pdf

School

University of Pennsylvania *

*We aren’t endorsed by this school

Course

5200

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

5

Report

Uploaded by HighnessArt13215

CIS 5200: Machine Learning Fall 2023 Final Exam Sample Questions University of Pennsylvania This is a brief document containing sample questions from the second half of the course . For questions on topics from the first half of the course, see the practice midterm. SAMPLE T/F Questions [?? points] For each question below, identify if the statements made are True or False. If you believe the statement is false, justify your answers. Correct “True” questions are worth 1 point, while correct “False” answers are worth two points – one for the correct answer, and one for a correct justification. (a) (T/F) It’s okay to evaluate test error multiple times on the same test set, so long as the model was never trained on that test set. (b) (T/F) Boosting reduces bias. (c) (T/F) When boosting the squared loss, examples that the current ensemble gets correct are removed from the training set. (d) (T/F) True positive rate, recall, and specificity all measure the same thing. (e) (T/F) A model that achieves 99% test accuracy is always a good model. (f) (T/F) In Adaboost, the weight of any given training example is strictly decreasing as you boost more. (g) (T/F) When bagging, it is possible for a training example to appear more than once in the same bootstrap sample or “bag”. 1
CIS 5200: Machine Learning Midterm Exam 2 (h) (T/F) In general, boosted decision stumps (decision trees with a single split) are less likely to overfit than single, full-depth decision trees. Name : Penn ID :
3 SAMPLE SHORT ANSWER QUESTIONS (a) [2 points] Suppose you are given the following dataset, along with an ensemble H ( x ) that makes the following predictions: x y H ( x ) 1 5 5.5 2 6 5.5 3 7 7.5 4 8 7.5 You are training a gradient boosted regression tree, boosting the squared loss. What dataset would you train your next weak learner on? List both the feature values and labels you would call the weak learner algorithm with. (b) [2 points] Suppose you determine that 0 . 99 n of your n training labels are +1. Would you expect accuracy to be a good measure of performance in this scenario? If yes, why? If not, what are some alternative performance metrics that you might use instead? (c) [2 points] Suppose you run PCA on a training dataset { x 1 , ..., x n } . After computing all eigen- values of the sample covariance matrix λ 1 , ..., λ n , you observe that: λ 1 + λ 2 λ 1 + · · · + λ n = 0 . 98 What does this mean about your data? (d) [2 points] Suppose you are training a decision tree on a binary classification dataset with 10 traning examples. You are considering a feature split that splits your dataset into the following two subsets: (a) Subset 1: 6 instances (4 positive labels, 2 negative labels) (b) Subset 2: 4 instances (1 positive label, 3 negative labels) Name : Penn ID :
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CIS 5200: Machine Learning Midterm Exam 4 Compute the Gini index of this split. You do not need to simplify or add fractions. Simply give an expression that evaluates to the correct Gini index. (e) [2 points] You are considering training two models on a dataset: ridge regression and kernel ridge regression. You plan on using an RBF kernel with a very small lengthscale σ . Which model would you expect to have higher variance? Why? Name : Penn ID :
5 SAMPLE LONG PROBLEM [8 points] In this problem, we consider several questions around the design and debugging of machine learning algorithms in practice. For each question, assume you are given a labeled training dataset D : { ( x 1 , y 1 ) , ..., x n , y n ) } , with y i ∈ {− 1 , +1 } . (a) [6 points] Throughout this class, we have seen numerous ways to control the performance of machine learning algorithms through hyperparameters. Discuss whether you would most expect the training error to increase or decrease when you: (1) Increase the number of trees boosted in a boosting model. (2) Increase the number of trees bagged in a bagging model. (3) Increase L2 regularization in a linear SVM. (4) Add more hidden units to the hidden layer of a 1 hidden layer neural network. (5) Reduce the dimensionality of your data with PCA from D dimensions down to k D dimensions. (6) Increase the value of γ in the RBF kernel k ( x , x ) = exp( γ x x 2 2 ) to be very large. (b) [2 points] As the size of your training set grows, what general trends do you expect to see for your training error? What about your validation error? Name : Penn ID :