Explanation of Solution
Cache Addressing:
The primary storage hierarchy contains cache lines that are grouped into sets. If each set contains k lines then we say that the cache is k-way associative.
A data request has an address specifying the location of the requested data. Each cache-line sized chunk of data from the lower level can only be placed into one set. The set that it can be placed into depends on its address.
The number of cache sets is equal to the number of cache blocks divided by the number of ways of associativity.
The least significant bits are used to determine the block offset.
For example:
One needs to consider the following set associative (S, E, B, m) = (8, 4, 4, 13). The derived value will be as follows:
The Index (CI):
The block off set (CO):
The tag bit (CT):
Hence, the “2” lower bits are block offsets (CO); followed by 3 sets of bit index (CI) and the remaining bits are tag bits (CT).
The following table gives the parameters for a number of different caches and the number of cache sets(S), tag bits(t), set index bits (s) and block offset bits (b) are defined.
Cache | m | C | B | E | S | t | s | b |
1 | 32 | 1024 | 4 | 1 | 256 | 22 | 8 | 2 |
2 | 32 | 1024 | 8 | 4 | 32 | 24 | 5 | 3 |
3 | 32 | 1024 | 32 | 32 | 1 | 27 | 0 | 5 |
The values for the above table are described below:
For cache-1:
It is given that
Hence:
For cache-2:
It is given that
Trending nowThis is a popular solution!
Chapter 6 Solutions
Computer Systems: A Programmer's Perspective (3rd Edition)
- li $t2, 2 L1: add $t1, $t1, $t2 sub $t1, $t1, $t3 bne $t1, $t4, L1 sub $t4, $s0, $t3 Given the modified single-cycle processor shown below, what are the values (in binary) of instruction[31-26], instruction[25-21], instruction[20-16], instruction[15-11], instruction[5-0], Read data 1, Read data 2, ALU zero, PCSrc, and all the main control decoded output signals when the time is at 1950 ns. The below single-cycle processor diagram can be used for your reference. Note: A new decoded signal output “Tzero” is added for executing “bne” instruction. The signal definition is described below: Instruction Opcode New Main Control Output Signal beq 00100b (4d) Tzero = 0 bne 00101b (5d) Tzero = 1 At the moment of 1950 ns, the below values (0, 1 or X) are:instruction[31-26] = instruction[25-21] = instruction[20-16] =instruction[15-0] = Read data 1 output = Read data 2 output = RegDst = ALUSrc = MemtoReg = RegWrite =…arrow_forward(Practice) a. Using Figure 2.14 and assuming the variable name rate is assigned to the byte at memory address 159, determine the addresses corresponding to each variable declared in the following statements. Also, fill in the correct number of bytes with the initialization data included in the declaration statements. (Use letters for the characters, not the computer codes that would actually be stored.) floatrate; charch1=M,ch2=E,ch3=L,ch4=T; doubletaxes; intnum,count=0; b. Repeat Exercise 9a, but substitute the actual byte patterns that a computer using the ASCII code would use to store characters in the variables ch1, ch2, ch3, and ch4. (Hint: Use Appendix B.)arrow_forwardSection 1.0 cites as a pitfall the utilization of a subset of the performace equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions. (1) A common fallacy is to use MIPS to compare the performace of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. (2) Another common performace figure is MFLOPS, defined as MFLOPS = No. FP operations / (execution time x 1E6) but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the processors.arrow_forward
- 4.19.16: [5] <COD §4.6>. In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies: Also, assume that instructions executed by the processor are broken down as follows: (a) What is the clock cycle time in a pipelined and non-pipelined processor? (b) What is the total latency of an lw instruction in a pipelined and non-pipelined processor? (c) If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? (d) Assuming there are no stalls or hazards, what is the utilization of the data memory? (e) Assuming there are no stalls or hazards, what is the utilization of the write-register port of the "Registers" unit? No hand written and fast answer with explanationarrow_forward1.BL=00, after instruction DEC BL is executed, CF =? 2.CH=80H; after ROL CH, 1; CH=?arrow_forward4.18 [5] <COD §4.5> Assume that X1 is initialized to 11 and X2 is initialized to 22. Suppose you executed the code below on a version of the pipeline from COD Section 4.5 (An overview of pipelining) that does not handle data hazards (i.e., the programmer is responsible for addressing data hazards by inserting NOP instructions where necessary). What would the final values of registers X3 and X4 be? ADDI X1, X2, #5 ADD X3, X1, X2 ADDI X4, X1, #15arrow_forward
- 1.3 Assemble the following assembly code into machine code. Assume that the machine language op-codes for load, store, mult, add, div, and sub are 18, 19, 13, 14, 15, and 16, respectively. Also assume that the variable x is stored at location M[50]. load R1, x mult R2, R1, #9 store x, R2 sub R0, R1, #8 div R2, R0, #2 I NEED THE MACHINE CODE IN DECIMAL PLEASE,arrow_forwardQuestion no 07: Write C Program Code to simulate Worst-Fit memory management Algorithm for the following Process. Number of Blocks: 6 (B1=6, B2=8, B3=10, B4=15, B5=20, B6=4) Number of Process: 5 (P1=1, P2=12, P3=17, P4=10, P5=6)arrow_forwardhelp for the mips code. dont use AI, divi is not using in mips. Q1)Suppose $t1 stores the base address of word array A and $s2 is associated with h, convert to the following instruction into MIPS. if A[m+3]<20: A[m+1] = 5 else: A[m] = 1Q2) Assume only $s1, $s2, $a0, $v0 registers can be used. Procedure calling convention MUST be followed. def func(x): a = x/3 if a == 20: return a else: return 1arrow_forward
- Please solve and show all work. Thank you. Translate the following MIPS code to C. Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. addi $t0, $s6, 4 add $t1, $s6, $0 sw $t1, 0($t0) lw $t0, 0($t0) add $s0, $t1, $t0arrow_forward15.a) Consider an instruction pipeline with four stages with the stage delays 5 nsec, 6 nsec, 11 nsec, and 8 nsec respectively. The delay of an inter-stage register stage of the pipeline is 1 nsec. What is the approximate speedup of the pipeline in the steady-state underideal conditions as compared to the corresponding non-pipelined implementation? b) Discuss structural hazards and control hazards with examples pls answer both subparrts,,its urgent thanksarrow_forwardSUBJECT: COMPUTER ORGANASATION 1. a. How many address bits are required to access 256K words of memory? b. How many bytes of memory can be addressed by 24 address bits?arrow_forward
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology Ptr