ECS171 Winter 2024 Mid study guide-1

.pdf

School

University of California, Davis *

*We aren’t endorsed by this school

Course

171

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by MagistrateTeam18466

ECS171 Winter 2024 Midterm Study Guide Cheat sheet Sigmoid and derivative of sigmoid: ࠵?(࠵?) = 1 1 + ࠵? !" ࠵?′(࠵?) = ࠵? ࠵?࠵? ࠵?(࠵?) = ࠵?(࠵?)(1 − ࠵?(࠵?)) ReLU and Derivative for ReLU: ࠵?(࠵?) = ࠵?࠵?࠵?(0, ࠵?) = E ࠵? ࠵?࠵? ࠵? > 0, 0 ࠵?࠵?ℎ࠵?࠵?࠵?࠵?࠵?࠵?. ࠵?′(࠵?) = O 1 ࠵?࠵? ࠵? > 0, 0 ࠵?࠵?ℎ࠵?࠵?࠵?࠵?࠵?࠵?, Gradient descent weight update: ࠵? #$% = ࠵? # − ࠵? ⋅ ࠵?࠵?(࠵? # ) Errors: ࠵?࠵?࠵? = 1 ࠵? ‘(࠵? & − ࠵? ’ b) ( ) &*% ࠵?࠵?࠵? = ‘(࠵? & − ࠵? ’ b) ( ) &*% Variance: ࠵?࠵?࠵? = ࠵? ( = ∑ (࠵? & − ࠵?̅) ( ) &*% ࠵? Chain Rule: ࠵?࠵? ℎ(࠵?) = ࠵?i࠵?(࠵?)k ℎ + (࠵?) = ࠵?′(࠵?(࠵?)) ∙ ࠵?′(࠵?) Performance Metrics: ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? = ࠵?࠵? ࠵?࠵? + ࠵?࠵? ࠵?࠵?࠵?࠵?࠵?࠵? (࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?) = ࠵?࠵? ࠵?࠵? + ࠵?࠵? ࠵? ! ࠵?࠵?࠵?࠵?࠵? = 2 ∗ ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? ∗ ࠵?࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? + ࠵?࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? = ࠵?࠵? + ࠵?࠵? ࠵?࠵? + ࠵?࠵? + ࠵?࠵? + ࠵?࠵? (࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?) ࠵?࠵?࠵? = ࠵?࠵? ࠵?࠵? + ࠵?࠵? (࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?࠵?) ࠵?࠵?࠵? = ࠵?࠵? ࠵?࠵? + ࠵?࠵?

Q1: In machine learning, what is the dropout technique, and how does it help prevent overfitting in neural networks? Answer: Dropout is a regularization technique in neural networks where during training, a random set of neurons are "dropped out" or turned off with a certain probability. This helps to prevent overfitting by making the network more robust and less reliant on specific neurons. For example, let's consider a neural network with dropout applied to a hidden layer. If the dropout rate is set to 0.5, during training, for each forward and backward pass, half of the neurons in that layer will be randomly deactivated. This means the network has to learn to be effective even when only a fraction of its neurons are active. This prevents the network from becoming overly specialized in recognizing patterns in the training data and helps it generalize better to unseen data. Q2: Given the hours of exercising per week measured in hours ( ࠵? ! ) and time from last Covid-19 infection measured in weeks( ࠵? " ), the model predicts the probability that the person will be re- infected with Covid-19 in the next 5 months. The model follows the Linear Regression ŷ = 0.4 − 0.05࠵? ! + 0.07࠵? " , for each of the data pairs in the training and testing sets, do the following: (a) Compute the predicted output for the given regression model for each set of input (b) Compute the bias (SSE) for each set of inputs (c) Compute the variance. (d) Is the model overfit, underfit, or a good fit? Justify your answer using the following metrics for the base case: SSE_train = 0.050 SSE_test = 0.049 Variance = 0.01 Where needed, round your answer to 4 d.p. Training set: ࠵? ! ࠵? " ࠵? ŷ (࠵? − ŷ) " 0.0 4.0 0.70 3.0 10.0 0.95 2.0 3.0 0.60 5.0 1.0 0.15 8.0 5.5 0.25

12.0 7.5 0.23 10.0 4.0 0.20 3.0 2.0 0.30 Testing set: ࠵? ! ࠵? " ࠵? ŷ (࠵? − ŷ) " 1.0 9.0 0.95 9.0 6.0 0.20 7.0 3.0 0.25 5.0 5.0 0.45 Answer: Training set: ࠵? ! ࠵? " ࠵? ŷ (࠵? − ŷ) " 0.0 4.0 0.70 0.68 0.0004 3.0 10.0 0.95 0.95 0 2.0 3.0 0.60 0.51 0.0081 5.0 1.0 0.15 0.22 0.0049 8.0 5.5 0.25 0.385 0.0182 12.0 7.5 0.23 0.325 0.0009 10.0 4.0 0.20 0.18 0.0004 3.0 2.0 0.30 0.39 0.0081

Testing set: ࠵? ! ࠵? " ࠵? ŷ (࠵? − ŷ) " 1.0 9.0 0.95 0.98 0.0009 9.0 6.0 0.20 0.37 0.0289 7.0 3.0 0.25 0.26 0.0001 5.0 5.0 0.45 0.50 0.0025 SSE_train = sum of all (࠵? − ŷ) " = 0.041 SSE_test = sum of all (࠵? − ŷ) " = 0.0324 Variance = 0.041 - 0.0324 = 0.0086 Therefore the model is better fit compared to the base case. Q3: Find the coefficients of a polynomial with degree 2 which gives the lowest mean square error. y_predicted = ax^2 + bx + c x= [1,2,3] y_actual = [4, 13, 20] Coefficient Set 1: a= 1, b=2, c=3 Coefficient Set 2: a=3, b=5, c=2 Coefficient Set 3: a=1, b=1, c=5 Answer: y_predicted = ax^2 + bx + c x= [1,2,3] y_actual = [4, 13, 20] Calculate the y_predicted for all 3 values of X for each coefficient set. For Coefficient set 1(a= 1, b=2, c=3): Y_predicted when x=1 is: 1*(1*1) + 2*1 + 3 = 6 Y_predicted when x=2 is: 1*(2*2) + 2*2 + 3 = 11 Y_predicted when x=3 is: 1*(3*3) + 2*3 + 3 = 18 MSE for coefficient set 1 = ((4-6)**2 + (13-11)**2 + (20-18)**2 )/3= 4 For Coefficient set 2(a=3, b=5, c=2): Y_predicted when x=1 is: 3*(1*1) + 5*1 + 2 = 10

Y_predicted when x=2 is: 3*(2*2) + 5*2 + 2 = 24 Y_predicted when x=3 is: 3*(3*3) + 5*3 + 2 = 44 MSE for coefficient set 2 = ((4-10)**2 + (13-24)**2 + (20-44)**2 )/3 = (36 + 121 + 484) / 3 = 213.66 For Coefficient set 3(a=1, b=1, c=5): Y_predicted when x=1 is: 1*(1*1) + 1*1 + 5 = 7 Y_predicted when x=2 is: 1*(2*2) + 1*2 + 5 = 11 Y_predicted when x=3 is: 1*(3*3) + 1*3 + 5 = 17 MSE for coefficient set 2 = ((4-7)**2 + (13-11)**2 + (20-17)**2 )/3 = (9 + 4 + 9) / 3 = 7.33 As we can see the lowest MSE is achieved for coefficient set 1. Therefore that is the best prediction of coefficients. Use the graph below to answer Q4-6:

Q4: If we designed this neural network with ReLU add after each hidden neural as activation function, and logistic(sigmoid) add in the end before we get the predicted value(y_hat), use the table below to calculate that which class does x1 = 7, x2 = 10, and x3 = 9 belongs. (Assume that as the result of one output has a value over a threshold τ = 0.9, it will be classified into that class.) W_1a = 0.4 W_a1=0.2 b_a=0.7 W_1b =-0.7 W_a2=0.5 b_b=0.4 W_2a=0.6 W_b1=-0.1 b_1=0.9 W_2b=-0.5 W_b2=0.6 b_2=-0.8 W_3a=0.3 W_3b=0.7 Answer: H_a = 7*0.4+10*0.6+9*0.3+0.7=12.2 H_b = 7*-0.7+10*-0.5+9*0.7+0.4=-3.2 O_1 = max(0,12.2)*0.2+max(0,-3.2)*-0.1+0.9=3.34 Y_hata = ࠵? (O_1) = 1/(1+e^(-3.34)) = 0.9658 O_2 = max(0,12.2)*0.5+max(0,-3.2)*0.6+(-0.8)=5.3 Y_hatb = ࠵? (O_2) = 1/(1+e^(-5.3)) = 0.9950 Therefore, this data point should belong to both category a and b. Q5: if the data points mentioned in Q7 have y_a= 1 and y_b = 0, update weight w_a1, w_a2, w_1a, and w_3a, consider the learning rate ࠵? =0.2. Assume we use SSE for the loss of this question. Answer: ∂error/∂w_a1 = ∂error/∂y_hata*∂y_hata/∂O_1*∂O_1/∂w_a1 =-(y_a-y_hata)*y_hata*(1-y_hata)*max(0,H_a) = -(1-0.9658)*0.9658*(1-0.9658)*max(0,12.2) = -0.01378 W_a1new = 0.2-0.2*(-0.01378) =0.2027 ∂error/∂w_a2 = ∂error/∂y_hatb*∂y_hatb/∂O_2*∂O_2/∂w_a2 =-(y_b-y_hatb)*y_hatb*(1-y_hatb)*max(0,H_a) = -(0-0.9950)*0.9950*(1-0.9950)*max(0,12.2)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version