CIS_5200_Fall_2023_Final_Practice_Problems

pdf

School

University of Pennsylvania *

*We aren’t endorsed by this school

Course

5200

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by HighnessArt13215

CIS 5200: Machine Learning Fall 2023 Final Exam Sample Questions University of Pennsylvania This is a brief document containing sample questions from the second half of the course . For questions on topics from the first half of the course, see the practice midterm. SAMPLE T/F Questions [?? points] For each question below, identify if the statements made are True or False. If you believe the statement is false, justify your answers. Correct “True” questions are worth 1 point, while correct “False” answers are worth two points – one for the correct answer, and one for a correct justification. (a) (T/F) It’s okay to evaluate test error multiple times on the same test set, so long as the model was never trained on that test set. (b) (T/F) Boosting reduces bias. (c) (T/F) When boosting the squared loss, examples that the current ensemble gets correct are removed from the training set. (d) (T/F) True positive rate, recall, and specificity all measure the same thing. (e) (T/F) A model that achieves 99% test accuracy is always a good model. (f) (T/F) In Adaboost, the weight of any given training example is strictly decreasing as you boost more. (g) (T/F) When bagging, it is possible for a training example to appear more than once in the same bootstrap sample or “bag”. 1

CIS 5200: Machine Learning Midterm Exam 2 (h) (T/F) In general, boosted decision stumps (decision trees with a single split) are less likely to overfit than single, full-depth decision trees. Name : Penn ID :

3 SAMPLE SHORT ANSWER QUESTIONS (a) [2 points] Suppose you are given the following dataset, along with an ensemble H ( x ) that makes the following predictions: x y H ( x ) 1 5 5.5 2 6 5.5 3 7 7.5 4 8 7.5 You are training a gradient boosted regression tree, boosting the squared loss. What dataset would you train your next weak learner on? List both the feature values and labels you would call the weak learner algorithm with. (b) [2 points] Suppose you determine that 0 . 99 n of your n training labels are +1. Would you expect accuracy to be a good measure of performance in this scenario? If yes, why? If not, what are some alternative performance metrics that you might use instead? (c) [2 points] Suppose you run PCA on a training dataset { x 1 , ..., x n } . After computing all eigen- values of the sample covariance matrix λ 1 , ..., λ n , you observe that: λ 1 + λ 2 λ 1 + · · · + λ n = 0 . 98 What does this mean about your data? (d) [2 points] Suppose you are training a decision tree on a binary classification dataset with 10 traning examples. You are considering a feature split that splits your dataset into the following two subsets: (a) Subset 1: 6 instances (4 positive labels, 2 negative labels) (b) Subset 2: 4 instances (1 positive label, 3 negative labels) Name : Penn ID :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CIS 5200: Machine Learning Midterm Exam 4 Compute the Gini index of this split. You do not need to simplify or add fractions. Simply give an expression that evaluates to the correct Gini index. (e) [2 points] You are considering training two models on a dataset: ridge regression and kernel ridge regression. You plan on using an RBF kernel with a very small lengthscale σ . Which model would you expect to have higher variance? Why? Name : Penn ID :

5 SAMPLE LONG PROBLEM [8 points] In this problem, we consider several questions around the design and debugging of machine learning algorithms in practice. For each question, assume you are given a labeled training dataset D : { ( x 1 , y 1 ) , ..., x n , y n ) } , with y i ∈ {− 1 , +1 } . (a) [6 points] Throughout this class, we have seen numerous ways to control the performance of machine learning algorithms through hyperparameters. Discuss whether you would most expect the training error to increase or decrease when you: (1) Increase the number of trees boosted in a boosting model. (2) Increase the number of trees bagged in a bagging model. (3) Increase L2 regularization in a linear SVM. (4) Add more hidden units to the hidden layer of a 1 hidden layer neural network. (5) Reduce the dimensionality of your data with PCA from D dimensions down to k ≪ D dimensions. (6) Increase the value of γ in the RBF kernel k ( x , x ′ ) = exp( − γ ∥ x − x ′ ∥ 2 2 ) to be very large. (b) [2 points] As the size of your training set grows, what general trends do you expect to see for your training error? What about your validation error? Name : Penn ID :

Related Documents

5080_assig_1_1.docx

Final_Exam_FALL_2023_6083B.docx

CSCI_5080_Assignment_04_01.docx

CSCI_5080_Assignment_07_1.docx

Response_for_GMIY.docx

Lesson_Plan_for_GMIY.docx

Week 13-2 CA Final Exam review P2.pdf

Solutions for Practice Problems for Exam (1).pdf

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Programming with Microsoft Visual Basic 2017

Computer Science

ISBN:9781337102124

Author:Diane Zak

Publisher:Cengage Learning

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

CMPTR

Computer Science

ISBN:9781337681872

Author:PINARD

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning
CMPTR
Computer Science
ISBN:9781337681872
Author:PINARD
Publisher:Cengage