TML_Assignment_1

.pdf

School

University at Buffalo *

*We aren’t endorsed by this school

Course

6261

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by ElderOstrich640

CIS 6261: Trustworthy Machine Learning (Spring 2023) Homework 1 — Adversarial Examples Name: Ankireddypalli Sravani March 2, 2023 This is an individual assignment. Academic integrity violations (i.e., cheating, plagiarism) will be reported to SCCR! The official CISE policy recommended for such offenses is a course grade of E. Additional sanctions may be imposed by SCCR such as marks on your permanent educational transcripts, dismissal or expulsion. Reminder of the Honor Pledge: On all work submitted for credit by Students at the University of Florida, the following pledge is either required or implied: “On my honor, I have neither given nor received unauthorized aid in doing this assignment.” Instructions Please read the instructions and questions carefully. Write your answers directly in the space provided. Compile the tex document and hand in the resulting PDF. In this assignment you will explore adversarial examples in Python. Use the code skeleton provided and submit the completed source file(s) alongside with the PDF. 1 Note: bonus points you get on this homework *do* carry across assignments/homework. Assignment Files The assignment archive contains the following Python source files: • hw.py . This file is the main assignment source file. • nets.py . This file defines the neural network architectures and some useful related functions. • attacks.py . This file contains attack code used in the assignment. Note: You are encouraged to carefully study the provided files. This may help you successfully complete the assignment. 1 You should use Python3 and Tensorflow 2. You may use HiPerGator or your own system. This assignment can be completed with or without GPUs. 1

Problem 0: Training a Neural Net for MNIST Classification (10 pts) In this problem, you will train a neural network to do MNIST classification. The code for this problem uses the following command format. python3 hw.py problem0 <nn_desc> <num_epoch> Here <nn desc> is a neural network description string (no whitespaces). It can take two forms: simple,<num hidden>, <l2 reg const> or deep . The latter specifies the deep neural network architecture (see get deeper classifier() in nets.py for details), whereas the former specifies a simple neural network architecture (see get simple classifier() in nets.py for details) with one hidden layer with <num hidden> neurons and an L 2 regularization constant of <l2 reg const> . Also, <num epoch> is the number of training epochs. For example, suppose you run the following command. python3 hw.py problem0 simple,64,0.001 100 This will train the target model on MNIST images for 100 epochs. 2 The target model architecture is a neural network with a single hidden layer of 64 neurons which uses L 2 regularization with a constant of 0 . 001. 3 (The loss function is the categorical cross-entropy loss.) 1. (5 pts) Run the following command: python3 hw.py problem0 simple,128,0.01 20 This will train the model and save it on the filesystem. Note that ’ problem0 ’ is used to denote training. The command line for subsequent problems ( problem1 , problem2 , etc.) will load the model trained. Before you can run the code you need to put in your UFID at the beginning of the main() function in hw.py . 2. (5 pts) What is the training accuracy? What is the test accuracy? Is the model overfitted? Training Accuracy – 95.3% Test Accuracy – 94.8% We can’t determine if model is overfitted. as training and test accuracy is close and accuracy may be high or low depend on different inputs 2 Each MNIST image is represented as an array of 28 · 28 = 784 pixels, each taking a value in { 0 , 1 , . . . , 255 } . 3 By default, the code will provide detailed output about the training process and the accuracy of the target model. 2

Problem 1: Mysterious Attack (50 pts) For this problem, you will study an unknown attack that produces adversarial examples. This attack is called the gradient noise attack. You will look at its code and run it to try to understand how it works. (The code is in gradient noise attack() which is located in attacks.py . ) This attack is already implemented, so you will only need to run it and answer questions about the output. However, before you can run the attack you will need to implement gradient of loss wrt input() found in nets.py . This function computes the gradient of the loss function with respect to the input. We will use it for the subsequent problems, so make sure you implement it correctly! To run the code for this problem, use the following command. python3 hw.py problem1 <nn_desc> <input_idx> <alpha> Here <input idx> is the input (benign) image that the attack will create an adversarial example from and <alpha> is a non-negative integer parameter used by the attack. The code will automatically load the model from file, so you need to have completed problem0 first! 1. (5 pts) Before we can reason about adversarial examples, we need a way to quantify the distortion of an adversarial perturbation with respect to the original image. Propose a metric to quantify this distortion as a single real number. 4 Explain your choice. My metric choice to quantify distortion of an adversarial perturbation is L1 Norm.Because, L1 norm is more robust as it takes absolute value so it ignores extreme values of data and decreases the cost of outliners and promotes sparsity. Locate the incomplete definition of the distortion () function in hw.py , and implement your proposed metric. What is the range of your distortion metric? Range of distortion metric : (8-77) 2. (10 pts) Before we can run the attack, you need to implement gradient of loss wrt input() located in nets.py . For this, you can use Tensorflow’s GradientTape . Follow the instructions in the comments and fill in the implementation (about 5 lines of code). Make sure this is implemented correctly and copy-paste your code below. 3. (15 pts) Now, let’s run the attack using the following command with various input images and alphas. python3 hw.py problem1 simple,128,0.01 <input_idx> <alpha> Note: it is important than the architecture match what you ran for Problem 0. (The code uses these arguments to locate the model to load.) For example, try: 4 The specific metric you implement is your choice and there many possible options, but you probably want to ensure that two identical images have a distortion of 0 and that any two different images have a distortion larger than 0, with the larger the difference between the images the larger the distortion value. 3

python3 hw.py problem1 simple,128,0.01 0 2 python3 hw.py problem1 simple,128,0.01 0 15 python3 hw.py problem1 simple,128,0.01 1 4 python3 hw.py problem1 simple,128,0.01 1 40 python3 hw.py problem1 simple,128,0.01 2 8 python3 hw.py problem1 simple,128,0.01 3 1 python3 hw.py problem1 simple,128,0.01 4 1 python3 hw.py problem1 simple,128,0.01 5 4 python3 hw.py problem1 simple,128,0.01 6 9 If you have implemented the previous function correctly, the code will plot the adversarial examples (see gradient noise.png ) and print the distortion according to your proposed metric. Produce adversarial examples for at least four different input examples and two values for the alpha parameter. Paste the plots here. (Use minipage and subfigure to save space.) Do you observe any failures? Do successful adversarial examples look like the original image? What do think is the purpose of the parameter alpha? Yes I observed one failure. For the samples for which attack is success Adversarial examples look almost similar to original image but it has some noise so the images are misclassified and confidence is low. The purpose of alpha is to scale the distortion. 4. (15 pts) Now, let’s look into the code of the attack ( gradient noise attack() located in attacks.py ) and try to understand how it works. First focus on lines 39 to 44 where the perturbation is created and added to the adversarial example. How is the perturbation made and how does it use the gradient of the loss with respect to the input? We are normalizing the gradient of loss .Calculating perturbation using sign of gradient and increasing the loss in the direction of gradient loss vector. Scaling the perturbation using alpha and adding perturbation to the adversarial example The code uses tf.clip by value() . What is the purpose of this function and why is it used by the attack? To clip the adversarial image to have pixel range within 0-255 Now let’s look at lines 50 to 57. Is the attack targeted or untargeted? What is the purpose of target class number ? How does the attack terminate and why? Attack is targeted. The purpose of target class number is to provide a standardized numerical representation of class labels. For the training data, the ground truth label is the target class number. The model adjusts its internal parameters through a procedure known as optimization during training to learn to map input attributes to their corresponding target class numbers. After the model has been trained, it can be used to forecast the desired class number for brand-new, unforeseen input data. To provide an output that can be understood by people, the anticipated target class number can then be transferred back to the matching class label. Attack terminates when maximum number of iterations is reached or if target class is reached. If within maximum number of iterations model couldn’t predict the label as target label and with confidence ¿=0.8 then attack fails 5. (5 pts) Finally let’s look at the lines 35 to 37 (the if branch). What functionality is implemented by this short snippet of code? Give a reason why doing this is a good idea. If sum of absolute of elements of gradient vector is less than 0.000000000001 (i.e, if gradient loss is small we are normalizing it in the direction of gradient loss by increasing it) so that it generates a perturbation that is enough to misclassify the image. 4

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version