PS#5

pdf

School

California Lutheran University *

*We aren’t endorsed by this school

Course

IDS575

Subject

Economics

Date

Apr 3, 2024

Type

pdf

Pages

11

Uploaded by SuperHumanCrabPerson1153

Report
Q1 Model Selection 30 Points Q1.1 4 Points You will perform cross-validation on a dataset with 100 examples. Your goal is to robustly measure the validation error to approximate out-of- sample performance of your model. If using a 10-fold cross-validation, you need to compute the validation error N1 times. To compute individual errors, you should train your model with a training data of size N2, and test the model on a validation data of size N3. Your final validation error will be the average of N1 individual validation errors. What are the appropriate numbers for N1, N2, N3? Q1.2 4 Points Which of the following cross-validation methods may not be suitable for a very large dataset with hundreds of thousands of examples? N1 = 10, N2 = 90, N3 = 10 N1 = 1, N2 = 90, N3 = 10 N1 = 10, N2 = 10, N3 = 90 N1 = 1, N2 = 10, N3 = 90 N1 = 10, N2 = 100, N3 = 100 N1 = 1, N2 = 100, N3 = 100
Q1.3 4 Points To use holdout validation for your classification problem, you are going to randomly split your supervised dataset into training, validation, and test partitions. Assume your dataset is sufficiently large. Select all correct Q1.4 4 Points Now you will run 10-fold cross-validation for training k-NN. For each of candidate k values, you train k-NN on all dataset but one of the 10 folds, then measuring an approximate validation error on the examples in that heldout fold. When you have 5 different candidate k values to decide the best model, you will train k-NN total N1 times. The performance of individual k-fold cross-validation Leave-one-out cross-validation Holdout validation All of the above Some partitions may consist of substantially more difficult- or easier-to-predict cases. Some partitions may contain a larger or smaller proportion among different label classes. Training performance could be decreased due to holding out subsets of data for validation and test. Measuring out-of-sample performance could become less accurate due to the random split.
models (with a specific k value) will be evaluated by the mean of N2 validation errors. Q1.5 4 Points To report and launch your prediction system, now you are to choose the final model (with the best performing k) given the results from the 10-fold cross-validation in Q1.4. Choose the best convention to come up with the final validation error and the final decision boundary. Q1.6 5 Points N1=10, N2=5 N1=5, N2=10 N1=45, N2=10 N2=45, N1=50 N1=50, N2=10 N1=10, N2=50 Pick the with the lowest mean validation error as the best model; Report that lowest mean validation error as the final validation error; Launch the decision boundary trained for -NN as it is. k k Pick the as the closest candidate to the weighted average of 5 candidate k values (where the weights are given by each k-NN's accuracy); Report the weighted average of mean validation errors as the final validation error; Launch the decision boundary trained for -NN as it is. k k Pick the as with the lowest mean validation error as the best model; Report that lowest mean validation error as the final validation error; Launch the decision boundary by retraining -NN on the entire dataset (including all 10 folds). k k
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
For spam classification, you have 100 emails in the validation data. When your trained hypothesis classifies 90 emails with their correct labels, what interval does the true error lie in with 95% confidence? (0.0412, 0.1588) Q1.7 5 Points Assume you evenly split the data D into 3 disjoint subsets , and . You run cross-validation to determine the better model between and . When you achieve the following test errors: , , , , , . Which model we should pick? Q2 Model Assessment 70 Points Q2.1 5 Points Choose all correct ones: h ^ Err ( ) p h ^ D 1, D 2 D 3 M 1 M 2 Err ( ) = D 1 h ^ M 1 0.32 Err ( ) = D 2 h ^ M 1 0.41 Err ( ) = D 3 h ^ M 1 0.15 Err ( ) = D 1 h ^ M 2 0.24 Err ( ) = D 2 h ^ M 2 0.38 Err ( ) = D 3 h ^ M 2 0.17 M1 M2 Indifferent M2 could be better but may not be significantly.
Q2.2 5 Points The AUC for a classifier that randomly guesses the prediction label is (just type in the value of AUC) 0.5 Q2.3 5 Points When fradulent transactions happens in some merchant, credit card companies often reimburse the amount paid without an authorization back to their customers, if the transactions are card-presented ones. If the frauds are made without a physical card, merchant could be liable for the cost. Assume you are running an online clothing merchant where most products have high margins but prcied not too high. Choose the right model and threshold suitable for your merchant given the following ROC cruve. The best threshold to flip the prediction label for binary classification is the probability of 0.5. Mean of the Average Precision is useful to compare the prediction performance across multiple classes. Precision-Recall curve may be more useful in practice if you the "positive" class is much more interesting but rarer than the "negative" class. Even under the same ROC curve, the best classifier could be different based on the nature of the problem.
Q2.4 6 Points Suppose you have 100 positive and 100 negative examples on your dataset. If gaining 1 more false positive costs comparably to losing 10 more false negative, what slope you should look into in a ROC curve? 10 In general, Red model is better than Blue model; the threshold on point A is more suitable for the fraud detection than point B In general, Red model is better than Blue model; the threshold on point B is more suitable for the fraud detection than point A In general, Blue model is better than Red model; the threshold on point A is more suitable for the fraud detection than point B In general, Blue model is better than Red model; the threshold on point B is more suitable for the fraud detection than point A. In general, Blue model is better than Red model; but we cannot determine which threshold is better between A and B.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q2.5 6 Points Which of the following statements are true? Select all correct statements. Q2.6 10 Points Suppose you have a confusion table as below: Predicted Negative Predicted Positive Negative Cases 9760 140 Positive Cases 40 60 Evaluate the F1-score. 0.4 Q2.7 10 Points You are to evaluate the performance of two classification models, M1 and M2. The test set consists of 10 binary features, labeled as A1, · · ·, A10. The following table shows the predicted probabilities by applying Precision-Recall (PR) curves must be monotonically decreasing. If two classifiers P-R curves intersect, at the point of intersection the two classifiers have identical confusion matrices. At any precision value, the best performing classifier is the classifier with a highest recall value. ROC is insensitive to the label imbalance.
each model to the test set. Assume that positive class is the class of interest as usual. Suppose you decide to use the model M1 with the threshold 0.5. That is, any test instance whose predicted probability is greater than or equal to 0.5 will be classified as a positive example. Compute the recall: 0.6 Q2.8 10 Points Using the table in Q2.7, plot the ROC curve and P-R curve for both M1 and M2 (two ROCs in one figure, and two PR curves in another figure). ROC Curve and PR Curve.pdf Download
1 of 1 Q2.9 10 Points Based on the P-R curves that you draw, compute Average Precision of M1 and M2. Avg Precision for M1 = 0.92666 Avg Precision for M2 = 0.52222
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q2.10 3 Points Based on above sub-question, Which model do you think is better? M1 M2 Indifferent M2 could be better but may not be significantly GRADED Problem Set (PS) #05 STUDENT Urvashiben Patel TOTAL POINTS 100 / 100 pts QUESTION 1 Model Selection 30 / 30 pts 1.1 (no title) 4 / 4 pts 1.2 (no title) 4 / 4 pts 1.3 (no title) 4 / 4 pts 1.4 (no title) 4 / 4 pts 1.5 (no title) 4 / 4 pts 1.6 (no title) 5 / 5 pts 1.7 (no title) 5 / 5 pts QUESTION 2 Model Assessment 70 / 70 pts
2.1 (no title) 5 / 5 pts 2.2 (no title) 5 / 5 pts 2.3 (no title) 5 / 5 pts 2.4 (no title) 6 / 6 pts 2.5 (no title) 6 / 6 pts 2.6 (no title) 10 / 10 pts 2.7 (no title) 10 / 10 pts 2.8 (no title) 10 / 10 pts 2.9 (no title) 10 / 10 pts 2.10 (no title) 3 / 3 pts