Final W22 Solution

.docx

School

University of Michigan *

*We aren’t endorsed by this school

Course

370

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

16

Uploaded by PresidentScienceKomodoDragon18

Report
1 of 16 ANSWER KEY Final Exam ANSWER KEY _____ _____ ____ ____ __________ ___ | ____| ____/ ___/ ___| |___ /___ / _ \ | _| | _|| | \___ \ |_ \ / / | | | | |___| |__| |___ ___) | ___) |/ /| |_| | |_____|_____\____|____/ |____//_/ \___/ EECS 370 Winter 2022: Intro to Computer Organization You are to abide by the University of Michigan College of Engineering Honor Code. Please sign below to signify that you have kept the honor code pledge: I have neither given nor received aid on this exam, nor have I concealed any violations of the Honor Code. Signature: ______________________________________________________________ Name: ______________________________________________________________ Uniqname: ______________________________________________________________ First/Last name of person sitting to your Right ( Write if you are at the end of the row) __________________________________ First/Last name of person sitting to your Left ( Write if you are at the end of the row) __________________________________ Exam Directions: You have 120 minutes to complete the exam. There are 8 questions in the exam on 16 pages (double-sided). Please flip through your exam to ensure you have all 16 pages. You must show your work to be eligible for partial credit! Write legibly and dark enough for the scanners to read your answers. Write your uniqname on the line provided at the top of each page. Exam Materials: You are allotted one 8.5 x 11 double-sided note sheet to bring into the exam room. You are allowed to use calculators that do not have an internet connection. All other electronic
2 of 16 ANSWER KEY devices, such as cell phones or anything or calculators with an internet connection, are strictly forbidden and usage will result in an Honor Code violation. 1. Short Questions ________ / 12 pts 2. Branch Prediction ________ / 11 pts 3. LC2K Pipeline Datapath Performance ________ / 10 pts 4. The 3 C’s of Caches ________ / 9 pts 5. New LC2K Pipeline Datapath ________ / 15 pts 6. Cache Locality ________ / 15 pts 7. VM Simulation ________ / 16 pts 8. VM Performance / Cache Performance ________ / 12 pts TOTAL ________ / 100 pts
3 of 16 ANSWER KEY 1. Short Questions [12 pts] Complete the following true/false and short answer questions True/False Questions [5 pts] Circle One: (a) Using the speculate-and-squash method to resolve control hazards can result in the same CPI as the detect-and-stall method. True / False (b) Increasing the associativity of a cache can reduce capacity misses. True / False (c) If a machine has infinite physical memory, virtual memory is no longer needed. True / False (d) A multi-level page table can consume more space than a single-level page table. True / False (e) Virtual address to physical address translation can happen simultaneously with the cache access. True / False Short Answer Questions (f) [2 pts] Consider a cache with a given size. You are asked to change the associativity and block size to reduce the tag area overhead: i) How would you change the associativity? Circle around your choice. Increase Associativity , Decrease Associativity ii) How would you adjust the block size? Circle around your choice. Increase Block Size , Decrease Block Size
4 of 16 ANSWER KEY (g) [3 pts] Consider the following assembly code being simulated on the 5-stage LC2K pipeline datapath that uses detect and forward for data hazards, speculate not-taken and squash for control hazards, and internal forwarding for register file. i) Circle around the registers of the executed instructions that are going to use forwarded-data from pipeline registers (including the internal forwarding of register file): Example: nor 5 6 7 1 2 3 4 5 6 7 lw 0 1 10 lw 0 2 10 beq 1 2 0 add 1 2 3 nor 1 3 4 sw 3 4 10 halt ii) How many cycles would it take to complete the program? Answer: __ 7 + 1 + 3 + 4 = 15 __ (h) [2 pts] Consider a byte-addressable system with a 128 B fully-associative cache that has a block size of 32 B. The following lines of code are executed: 1 2 3 4 int data[20]; for(unsigned int i = 0; i < 20; i++) { data[i] = data[i] + i; } How many bytes would be written into main memory for data array accesses, in each case of write-through and write-back policy for the cache. You can assume data starts from address 0 and all dirty blocks are written in their entirety back to cache at the end of the program. Write-through: ______ 80 ________ B Write-back: ______ 96 _______ B
5 of 16 ANSWER KEY 2. Branch Prediction [11 pts] Simulate an LC2K assembly program and answer questions about its branches Consider the following LC2K assembly program running on our 5-stage pipelined datapath from lecture. This program counts the number of zeros in the binary array Arr and stores that count in r5 . Assume all registers are initialized to zero. Answer the following question about the program’s branch decisions. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 lw 0 1 One //r1 = 1 lw 0 2 Len //r2 = Len lw 0 3 Zero //r3 = loop counter i loop beq 3 2 end //loop terminates if i==Len lw 3 4 Arr //load Arr[i] beq 4 1 cont //if Arr[i]==1, skip Line 7 add 5 1 5 //increment r5 cont add 3 1 3 //increment i beq 0 0 loop end halt Zero .fill 0 One .fill 1 Len .fill 3 Arr .fill 1 .fill 1 .fill 0 (a) [7 pts] Write the sequence of branch decisions for each beq instruction, using T to represent “taken” and N to represent “not taken.” (You might not need all the boxes.) Line 4 beq (Loop termination condition) N N N T Line 6 beq (If-Condition) T T N Line 9 beq (Loop condition) T T T Combined sequence, as seen globally N T T N T T N N T T (b) [4 pts] What percentage of the total branch decisions are correctly predicted when using the following prediction schemes? (You can leave your answers as fractions.) Partial credits are given based on incorrect prediction patterns of part a. Predict always not taken (N) _______ 4/10 ________
6 of 16 ANSWER KEY Predict backwards taken (T), forwards not taken (N) _______ 7/10 ________ Predict using a local ( i.e., per beq instruction) 2-bit branch predictor intialized to the starting state “strongly taken” _______ 6/10 ________ 3. LC2K Pipeline Datapath Performance [10 pts] Calculate the CPI and execution time for a pipelined datapath Consider an LC2K assembly program running on our 5-stage pipelined datapath from lecture, which has internal forwarding for the register file. The program has the following instruction breakdown: R-type instructions ( add and nor ) 50% Load instructions ( lw ) 25% Store instructions ( sw ) 15% Branch instructions ( beq ) 15% In the program, 60% of branches are not taken (N). Additionally, 20% of the R-type instructions are followed immediately by a dependent R-type instruction (e.g., add 1 1 2 , add 2 2 3 ) and 80% of load instructions are followed immediately by a dependent R-type instruction (e.g., lw 0 3 0 , add 3 3 4 ). There are no other data dependencies. (a) [4 pts] What is the CPI of the program when the processor uses detect-and-stall for data hazards and speculate-and-squash for control hazards with a branch predictor that always predicts not taken (N). Please show your work for partial credit. CPI = 1 + stalls for data hazards + stalls for control hazards = 1 + (0.5*0.2 + 0.25*0.8)(2 cycles) + (0.15*0.4)(3 cycles) = 1 + (0.5 * 0.2 * 2 cycles)+ (0.25 * 0.8 * 2 cycles) + (0.15 * 0.4 * 3 cycles) = 1 + 0.6 + 0.18 = 1.78 (b) [4 pts] What is the CPI of the program when the processor uses detect-and-forward for data hazards and speculate-and-squash for control hazards with a branch predictor that always predicts not taken (N). Please show your work for partial credit. CPI = 1 + stalls for data hazards + stalls for control hazards = 1 + (0.25*0.8)(1 cycles) + (0.15*0.4)(3 cycles) = 1 + 0.2 + 0.18 = 1.38 *** Term for control hazards can be transferred from (a) with no additional loss of points (c) [2 pts] Given that the processor frequency is 1MHz and the program executes one million instructions, how many seconds faster is the execution time of the program when using detect-and-forward? Show your work by calculating the execution time in seconds for parts (a) and (b) , then finding the difference between the two. Ex. time for (a): (1.78 cycles/insn) * (1,000,000 insn) / (10^6 cycles/second) = 1.78 s Ex. time for (b): (1.38 cycles/insn) * (1,000,000 insn) / (10^6 cycles/second) = 1.38 s
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help