14-model-selection-validation-4up

.pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6254

Subject

Industrial Engineering

Date

Oct 30, 2023

Type

pdf

Pages

10

Uploaded by AmbassadorClover12145

Report
Project organization Project proposals due March 14 (~1.5 weeks) I would like to make sure everyone has a team, so I want to add a new deadline… By TODAY please go to the link posted on Piazza (https://goo.gl/p5nTxb) and add your team’s details to the spreadsheet: team members tentative project title campus(es) where team members are located number of team members whether you are potentially open to adding more members Exam Details – Wed 3/7/18 • Coverage: HW #1-3 Also lectures through the lecture on the VC bound (from Feb 19). The midterm will not cover lecture material after Feb 19. The following are not on the exam: § Regression, Tikhonov Regularization, Bias and Variance of Regression Function Sets, LASSO, etc. A single sheet of notes (front and back) allowed 75 minute time limit (3:00 PM - 4:15 PM) No calculators allowed Sample questions are posted Given a set , find a function that minimizes More complex Less complex We must carefully limit “complexity” to avoid overfitting better chance of approximating the ideal classifier/function Approximation-generalization tradeoff better chance of generalizing to new data (out of sample) Approximation-generalization tradeoff “Complexity” of hypothesis set Error Out-of-sample error In-sample error generalization error
Approximation-generalization tradeoff “Complexity” of hypothesis set Error bias variance Out-of-sample error In-sample error Learning curve – A simple model Number of data points ( ) Expected Error bias Out-of-sample error In-sample error Learning curve – A complex model Number of data points ( ) Expected Error bias Out-of-sample error In-sample error Bias-variance decomposition What is it good for? Practically, impossible to compute bias/variance exactly… Can estimate empirically split data into training and test sets split training data into many different subsets and estimate a classifier/regressor on each compute bias/variance using the results and test set In reality, just like with the VC bound, more useful as a conceptual tool than as a practical technique
Developing a good learning model The bias-variance decomposition gives us a useful way to think about how to develop improved learning models Reduce variance (without significantly increasing the bias) limiting model complexity (e.g. polynomial order in regression) – regularization can be counterintuitive (e.g Stein’s paradox) typically can be done through general techniques Reduce bias (without significantly increasing the variance) exploit prior information to steer the model in the correct direction typically application specific Example Least-squares is an unbiased estimator, but can have high variance Tikhonov regularization deliberately introduces bias into the estimator (shrinking it towards the origin) The slight increase in bias can buy us a huge decrease in the variance, especially when some variables are highly correlated The trick is figuring out just how much bias to introduce… Model selection In statistical learning, a model is a mathematical representation of a function such as a – classifier regression function – density – … In many cases, we have one (or more) “free parameters” that are not automatically determined by the learning algorithm Often, the value chosen for these free parameters has a significant impact on the algorithm’s output The problem of selecting values for these free parameters is called model selection Examples Method Parameter polynomial regression polynomial degree ridge regression/LASSO regularization parameter robust regression loss function parameter regularization parameter SVMs margin violation cost kernel methods kernel choice/parameters regularized LR regularization parameter -nearest neighbors number of neighbors
Model selection dilemma We need to select appropriate values for the free parameters All we have is the training data We must use the training data to select the parameters However, these free parameters usually control the balance between underfitting and overfitting They were left “free” precisely because we don’t want to let the training data influence their selection, as this almost always leads to overfitting e.g., if we let the training data determine the degree in polynomial regression, we will just end up choosing the maximum and doing interpolation Big picture For much of this class, we have focused on trying to understand learning via decompositions of the form Validation takes another approach: After we have selected , why not just try (a little harder) to estimate directly? VC dimension regularization Validation Suppose that in addition to our training data, we also have a validation set Use the validation set to form an estimate Examples • Classification: • Regression: Accuracy of validation What can we say about the accuracy of ? In the case of classification, , which is just a Bernoulli random variable Hoeffding: More generally, we always have
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: 22. A 25.00 mL of an unknown HCl solution requires titration with 22.50 mL of 0.2000 M NaOH to reach…
Q: Calculate the drift velocity of electrons in a copper wire with a diameter of 2.053 mm (12-gauge)…
Q: What is the meaning and implication of Zimbardo's work on the power of social roles? Following with…
Q: onsider the following information for Sanchez Corporation for the year ended December 31, 2022.…
Q: A political candidate has written an op-ed piece in which she claims that the U.S. government debt…
Q: Who is Ernest W. Burgess, and what does he say in The Growth of the City?
Q: Suppose there is a single electron a set distance from a positive point charge Q, which quantities…
Q: the modified grammar is LL(1)
Q: Which of the following functions f has a removable discontinuity at a? If the discontinuity is…
Q: An automobile tire is filled to a gauge pressure of 200 kPa at 10°C. After a drive of 100 km,  the…
Q: Find the limit. (Let g and h represent arbitrary real numbers. If an answer does not exist, enter…
Q: The number of unique protons in a molecule will correspond to the number of signals in the ¹H NMR…
Q: Which of the following has polar bonds but is nonpolar?
Q: A scientist extracts and measures the calcium content of one of your muscle fibers while you are…
Q: n contribute to so
Q: Draw diethylaminoethane
Q: 5+ 20-19 -18 -17-16 -15 -14 -13 -12 11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 13 4 5 7 8 -1 -2 -3- -4- The…
Q: onsider the challenge of determining whether a witness questioned by a law enforcement agency is…
Q: The compound shown here is classified as what type of amine? N H A) primary B) secondary C) tertiary…
Q: Type I error: Washers used in a certain application are supposed to have a thickness of 2…
Q: From the list below, select the measure(s) that are measures of inequality. Intergenerational…
Q: What is the correct systematic name for the compound shown here? N