COMP6321_ML_Assignment2

.pdf

School

Concordia University *

*We aren’t endorsed by this school

Course

6321

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by PrivateWaterBuffalo17358

COMP6321 Machine Learning (Fall 2023) Major Assignment #2 Due: 11:59PM, November 30th, 2023 Note Your will be submitting two separate files from this assignment as follows: (a) One(1) .pdf file: containing answers to the questions as well as reported results from coding you develop. Include snapshots of the pieces of code you developed in the appendix. Failure to do so will incur a -5% penalty. (b) One(1) .zip folder: containing all developed Python codes including a README.txt file explaining how to run your code. 1

Theoretical Questions Question 1 Consider the following architecture of a simple Recurrent Neural Network (RNN): In this network, all the inputs, hidden states, and weights are assumed to be 1D. This network is to be used for a many-to-one regression task. The network produces an output y at the last time step L , which is given as y = f ( w hy h L ). For the following questions, assume that no nonlinearities are used (i.e. no activation functions). (a) Write a general expression for h t in terms of w in , w hh , x t , and h t − 1 , where t is the timestep. (b) Given the sequential input x = [ 0 . 9 , 0 . 8 , 0 . 7 ] , the initial hidden state h 0 = 0 , and all weights initialized to 0 . 5 , compute h 1 , h 2 , h 3 , and y . (c) You want to perform Backpropagation Through Time (BPTT) , which is used to update the weights of RNN networks. Given a target output y r , Assume that you are using the function l = 1 2 ( y − y r ) 2 to calculate the loss. The goal is to compute δl δw in , δl δw hh , and δl δw hy , and use them in the typical update rule given as w i ← w i − η δl δw i . Find the expressions for δl δw in , δl δw hh , and δl δw hy . Your expressions can include only the following: h 0 , w in , w hh , w hy , and x i for i ∈ [ 1 , 2 , 3 ] . Show your work. (hint: for δl δw in and δl δw hh , you need to consider the sum across all timesteps) (d) Assume that the target for the previously given data sequence is y r = 0 . 8 , with a learning rate of η = 0 . 1 . Calculate the updated value of each weight. Question 2 Consider a Support Vector Machine (SVM) problem in 1D settings (with one weight and one bias). For the following questions, assume we have the dataset D = {( x 1 ,t 1 ) , ( x 2 ,t 2 ) , ( x 3 ,t 3 )} = {( 3 , 1 ) , ( 5 , − 1 ) , ( 2 , 1 )} . (a) Define the optimization problem (objective function and constraints) of SVM based on margin maximization. (b) Solve the optimization problem using the graphical method. Explain your work. Question 3 Consider the following dataset used to train a decision tree to predict if the weather is good for playing outside. Use the dataset to answer the following questions. Show your work. 2

Outlook Temperature Windy Play / Don’t Play sunny 85 false Play sunny 80 true Don’t Play overcast 83 false Play rain 70 false Play rain 68 false Don’t Play rain 65 true Don’t Play overcast 64 true Play sunny 72 false Don’t Play sunny 69 false Play rain 75 false Play sunny 75 true Don’t Play overcast 72 true Play overcast 81 false Play rain 71 true Don’t Play (a) Using Gini impurity, determine the best splitting threshold for Tempera- ture, out of the following values: [65, 70, 75, 80]. (b) Analyze the impurity of the three features and determine the best feature for splitting. (c) Finalize your decision tree. At each node, repeat the two previous steps to determine the best splitting feature and, if applicable, its threshold. Use a maximum depth of 4 (first split occurs at depth = 1). Question 4 Consider the following dataset of unlabeled points: Point X-coordinate Y-coordinate P1 9 1 P2 1 1 P3 9 2 P4 8 1 P5 9 20 P6 2 2 P7 8 2 P8 1 2 P9 2 1 Your goal is to use K-means and K-medoids to cluster these data into two clusters. Use the Euclidean Distance as the distance measure. (a) Using P1 and P2 as the starting centroids, perform three iterations of K-means to cluster all the data points. 3

(b) K-medoids is a variation of K-means where the cluster heads (medoids) are always chosen from the available data points. A cluster medoid, in each iteration, is given as the point with lowest sum of distances to all other points in the cluster (this is one of the several methods to compute the medoids). Using P1 and P2 as the starting medoids, perform three iterations of K-medoids to cluster all the data points. (c) Compare the performance of K-means and K-medoids. Reflect on the distribution of the data points and their effect on the clustering and cluster heads in K-means and K-medoids. Question 5 Consider the following dataset for the prediction of a customer’s likelihood to purchase a certain product based on historical activities. Online Activity Product Views Past Purchases Purchase Likelihood Low Low None Unlikely Medium Moderate Few Moderate High High Many Likely Low Low Many Moderate High Moderate None Likely Medium Low None Unlikely Low High Few Moderate High Moderate Few Likely Medium Moderate Many Likely Low Low Few Unlikely High High None Moderate Medium Low Many Moderate Medium High Few Moderate High Low Many Likely Low Moderate None Unlikely Medium High None ????? High High Few ????? Low Moderate Many ????? Your goal is to build a Naive Bayes classifier based on the labeled data, and use it to classify the new unlabeled data points. (a) For each feature, build the frequency and likelihood tables. (b) Compute the prior probabilities for each class. (c) Classify each of the three new data points to the appropriate class. Show your work and the steps used to identify the suitable class. Question 6 Consider the following dataset of 5 data points. Each data point has two features (a, b) and a class label ∈ { − 1 , 1 } . 4

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version