Assignment_2

.pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

1508

Subject

Electrical Engineering

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by ProfessorProton19692

University of Toronto Department of Electrical and Computer Engineering ECE 1508S2: Applied Deep Learning A. Bereyhi - Winter 2024 Assignment 2 Feedforward Neural Networks D ATE : Feb 2, 2024 D UE : Feb 16, 2024 P REFACE This is the second series of assignments for the course Special Topics in Communica- tions: Applied Deep Learning . The exercises are aimed to review the topics of Chapter 2, i.e., Feedforward Neural Networks. Below, you can find the information regarding the contents of these exercises, as well as instructions on how to submit them. G ENERAL I NFORMATION The assignments are given in two sections. In the first section, you have written ques- tions that you can answer in words or by derivation. The questions are consistent with the material of Chapter 2 and you do not need any further resources to answer them. The second section includes the programming assignments. For these assignments, you need to use the package torch in Python. For those who are beginners, an intro- duction has been given and some useful online resources have been cited. In the case that a question is unclear or there is any flaws, please contact over Piazza. Also, in case that any particular assumption is required to solve the problem, feel free to consider the assumption and state it in your solution. The total mark of the assignments is 100 points with written questions having the following mark distribution: • Question 1: 10 points • Question 2: 5 points Assignment 2 Deadline: Feb 16, 2024 Page 1 of 14

• Question 3: 10 points • Question 4: 5 points The mark distribution of the programming assignments are further as follows: 50 points for the first assignment and 20 points for the second assignment. There- fore, the total mark of written questions adds to 30 points and the total mark of the programming assignments adds to 70 points. H OW TO S UBMIT Please submit the answer to written exercises as a PDF or image file. It does not require to be machine-typed, and you can submit the photo of your handwritten solu- tions. For the programming tasks, it is strongly suggested to use the Python notebook Assgn _ 2.ipynb that is available on Quercus and you can use it for submission. Note that most of the codes for programming assignments are already given in the Python notebook and you are only asked to complete the code in the indicated lines. Nevertheless, this is not mandatory to use this file and you can use any other file format for your submission. Regardless of what format or template you choose, your submission for the programming assignments should be included in a single file. 1 The deadline for your submission is on February 16, 2024 at 11:59 PM . • You can delay up to three days, i.e., until February 19, 2024 at 11:59 PM . After this extended deadline no submission is accepted. • In case of your delay, you lose one of your two penalty-free delays. After two penalty-free delays, each day of delay deducts 10% of the assignment mark. Please submit your assignment only through Quercus, and not by email. 1 W RITTEN E XERCISES Q UESTION 1: F ORWARD AND B ACKWARD P ASS In this exercise, we try forward and backward propagation for the simple feedforward neural network (FNN), we had in the first series of assignments. This FNN has been shown in Figure. 1.1 . In this FNN, we have used soft-ReLU function for activation in the hidden layer. This means that f ( · ) in Figure. 1.1 is f ( z ) = log ( 1 + e z ) with log taken in natural base. The output layer is further activated via the sigmoid function, i.e., σ ( z ) = 1 1 + e − z . For training of this FNN, we use the cross-entropy function as the loss function. We are given with the data-point x = " 1 1 # 1 A zip file of multiple executable files is also accepted. Assignment 2 Deadline: Feb 16, 2024 Page 2 of 14

x 1 x 2 f f σ y Figure 1.1: Fully-connected FNN with two-dimensional input x = [ x 1 , x 2 ] T . whose true label is v 0 = 1. We intend to perform one forward and backward pass by hand. To this end, assume that all weights and biases are initiated by value 0.1, i.e., all entries of W (0) 1 and w (0) 2 are 0.1, where W 1 is the matrix containing all weights and biases of the hidden layer and w 2 is the vector containing all weights and biases of the output layer. 1. Determine all variables calculated in the forward pass. You have to explain the order of your calculation using the forward propagation algorithm. 2. Determine the gradient of the loss with respect to all the weights and biases at the given initial values via backpropagation . Note: You must use the backpropagation algorithm. 3. Assume we are doing sample-level training. Calculate the updated weights and biases for the next iteration of gradient descent, i.e., W (1) 1 and w (1) 2 . Q UESTION 2: F ORWARD -P ROPAGATION R EVISITED Consider a fully-connected feed- forward neural network (FNN) with L hidden layers. The input data-point x to this FNN has N entries, i.e., x ∈ N . The hidden layer ℓ for ℓ ∈ { 1, ... , L } has W ℓ neurons all being activated with activation function f ℓ ( · ) : 7→ and its output layer contains W L +1 neurons with activation function f L +1 ( · ) : 7→ . For this network, we derived the forward-propagation algorithm in the lecture as given in Algorithm 1 . Algorithm 1 ForwardProp(): Standard Form Derived in Lecture 1: Initiate with y 0 = x 2: for ℓ = 0, ... , L do 3: Add y ℓ [0] = 1 and determine z ℓ +1 = W ℓ +1 y ℓ # forward affine 4: Determine y ℓ +1 = f ℓ +1 ( z ℓ +1 ) # forward activation 5: end for 6: for ℓ = 1, ... , L + 1 do 7: Return y ℓ and z ℓ 8: end for In this algorithm, matrix W ℓ +1 ∈ W ℓ +1 × ( W ℓ +1) which contains all the weights and bi- ases of the neurons in layer ℓ + 1, where we define the input layer to be layer 0 with W 0 = N nodes, i.e., the input entries, and the output layer to be layer L + 1. In this exercise, we intend to represent an alternative form for forward-propagation in which we represent the weights and biases as separate components. 2 For ℓ ∈ { 0, ... , L } , let ˜ W ℓ +1 ∈ W ℓ +1 ×W ℓ be a matrix whose entry in row j and column i denotes the weight of neuron j in layer ℓ +1 for its i -th input. Moreover, let b ℓ +1 ∈ W ℓ +1 2 This means that we do not want to use the dummy node 1 in each layer as we did it in the lecture. Assignment 2 Deadline: Feb 16, 2024 Page 3 of 14

be the vector of biases in layer ℓ + 1 whose entry j denotes the bias of neuron j in layer ℓ + 1. 1. Write the affine transform of layer ℓ + 1 in terms of the weight matrix ˜ W ℓ +1 and the bias vector b ℓ +1 . 2. Re-write the forward-propagation algorithm in terms of ˜ W ℓ +1 and b ℓ +1 . For sake of simplicity, an uncompleted version of the algorithm is given below in Algorithm 2 : you should only complete the blank lines . Hint: Note that this alternative form should not contain W ℓ anymore. Algorithm 2 ForwardProp(): Alternative Form 1: ---------- # complete 2: for ℓ = 0, ... , L do 3: ---------- # complete 4: Determine y ℓ +1 = f ℓ +1 ( z ℓ +1 ) # forward activation 5: end for 6: for ℓ = 1, ... , L + 1 do 7: Return y ℓ and z ℓ 8: end for 3. Explain what is the relation between matrix W ℓ in Algorithm 1 and ˜ W ℓ and b ℓ in Algorithm 2 . Q UESTION 3: C HAIN -R ULE FOR A FFINE O PERATION Assume that scalar ˆ R ∈ is a function of vector y ∈ K , i.e., ˆ R = L ( y ) for some L ( · ) : K 7→ 1. We already have calculated the gradient of ˆ R with respect to y , i.e., we have the vector ∇ y ˆ R =     ∂ ˆ R ∂ y 1 . . . ∂ ˆ R ∂ y K     . We further know that y is an affine function of an input vector z ∈ N , i.e., y = Az + b for some matrix A ∈ K × N and b ∈ K . We want to calculate the gradient of ˆ R with respect to any of these three components, i.e., z , A and b from ∇ y ˆ R . 1. First assume that A and b are given. We intend to calculate gradient of ˆ R with respect to z , i.e., ∇ z ˆ R . The computation graph for this problem is shown in Figure. 1.2 . Determine ∇ z ˆ R in terms of ∇ y ˆ R . Hint: You need to present the result compactly as a matrix-vector multiplication. z y ˆ R A z + b L ∇ y ˆ R Figure 1.2: Computation graph for Case 1, where we aim to calculate ∇ z ˆ R . Assignment 2 Deadline: Feb 16, 2024 Page 4 of 14

2. Now, assume another case in which z and b are given, and we intend to calcu- late gradient of ˆ R with respect to A , i.e., ∇ A ˆ R . The computation graph for this problem is shown in Figure. 1.3 . Determine ∇ A ˆ R in terms of ∇ y ˆ R . Hint: You need to present the result compactly as a vector-vector multiplication. A y ˆ R A z + b L ∇ y ˆ R Figure 1.3: Computation graph for Case 2, where we aim to calculate ∇ A ˆ R . 3. As the last case, assume that A and z are given. We now intend to calculate gradient of ˆ R with respect to b , i.e., ∇ b ˆ R . The computation graph for this problem is shown in Figure. 1.4 . Determine ∇ b ˆ R in terms of ∇ y ˆ R . Hint: You need to present the result compactly as a vector. b y ˆ R Az + b L ∇ y ˆ R Figure 1.4: Computation graph for Case 3, where we aim to calculate ∇ b ˆ R . Q UESTION 4: B ACKPROPAGATION R EVISITED We now extend the alternative repres- entation of Question 2 to the backward pass, using the results of Question 3. Recall the backpropagation algorithm derived in the lecture: this is given in Algorithm 3 . In this algorithm, ↼ y ℓ represents the gradient of loss ˆ R determined for data-point ( x , v ), i.e., if y L +1 denotes the output of the FNN for input point x ; then, ˆ R = L ( y L +1 , v ). Fur- thermore, ˙ f ℓ ( · ) : 7→ denotes the derivative of activation f ℓ ( · ) with respect to its argument and ⊙ is entry-wise product. Algorithm 3 BackProp(): Standard Form Derived in Lecture 1: Initiate with ↼ y L +1 = ∇ L ( y L +1 , v ) and ↼ z L +1 = ↼ y L +1 ⊙ ˙ f L +1 ( z L +1 ) 2: for ℓ = L , ... , 1 do 3: Determine ↼ y ℓ = W T ℓ +1 ↼ z ℓ +1 and drop ↼ y ℓ [0] # backward affine 4: Determine ↼ z ℓ = ˙ f ℓ ( z ℓ ) ⊙ ↼ y ℓ # backward activation 5: end for 6: for ℓ = 1, ... , L + 1 do 7: Return ∇ W ℓ ˆ R = ↼ z ℓ y T ℓ − 1 8: end for 1. Using the results of Question 3, complete the alternative form of Backpropaga- tion algorithm given below in Algorithm 4 . This alternative form should only contain matrices ˜ W ℓ +1 and vectors b ℓ +1 , as defined in Question 2. Assignment 2 Deadline: Feb 16, 2024 Page 5 of 14

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version