Computer Systems: Program... -Access
3rd Edition
ISBN: 9780134071923
Author: Bryant
Publisher: PEARSON
expand_more
expand_more
format_list_bulleted
Expert Solution & Answer
Chapter 4.1, Problem 4.3PP
Explanation of Solution
Given assembly code:
long sum(long *start, long count)
start in %rdi, count in %rsi
sum:
irmovq $8, %r8
irmovq $1, %r9
xorq %rax, %rax
andq %rsi, %rsi
jmp test
loop:
mrmovq (%rdi), %r10
addq %r10, %rax
addq %r8, %rdi
subq %r9, %rsi
test:
jne loop
ret
Data movement instructions:
- The different instructions are been grouped as “instruction classes”.
- The instructions in a class performs same operation but with different sizes of operand.
- The “Mov” class denotes data movement instructions that copy data from a source location to a destination.
- The class has 4 instructions that includes:
- movb:
- It copies data from a source location to a destination.
- It denotes an instruction that operates on 1 byte data size.
- movw:
- It copies data from a source location to a destination.
- It denotes an instruction that operates on 2 bytes data size.
- movl:
- It copies data from a source location to a destination.
- It denotes an instruction that operates on 4 bytes data size.
- movq:
- It copies data from a source location to a destination.
- It denotes an instruction that operates on 8 bytes data size.
- movb:
Unary and Binary Operations:
- The details of unary operations includes:
- The single operand functions as both source as well as destination.
- It can either be a memory location or a register.
- The instruction “incq” causes 8 byte element on stack top to be incremented.
- The instruction “decq” causes 8 byte element on stack top to be decremented.
- The details of binary operations includes:
- The first operand denotes the source.
- The second operand works as both source as well as destination.
- The first operand can either be an immediate value, memory location or register.
- The second operand can either be a register or a memory location...
Expert Solution & Answer
Want to see the full answer?
Check out a sample textbook solutionStudents have asked these similar questions
1. We wish to compare the performance of two different machines: M1 and M2. The following measurements have been made on these machines:
Program
Time on M1
Time on M2
1
10 seconds
5 seconds
2
3 seconds
4 seconds
Which machine is faster for each program, and by how much?
2. For M1 and M2 of problem 1, the following additional measurements are made:. Find the instruction execution rate (instructions per second) for each machine when running program 1.
Program
Instructions executed on M1
Instructions executed on M2
1
200 x 106
160 x 106
3. For M1 and M2 of problem 1, if the clock rates are 200 MHz and 300 MHz, respectively, find the CPI for program 1 on both machines using the data provided in problems 1 and 2.
4. You are going to enhance a machine, and there are two possible improvements: either make multiply instructions run four times faster than before or make memory access instructions run two times faster than before. You…
A program has the following breakdown:
25% ld (50% of them directly followed by a dependent instruction),25% sd, 30% r_type, 20% beq (80% of them are taken. Branches are calculated in the third cycle. What is the average CPI of the program when run on the pipelined RISC V implementation in the textbook?
1.BL=00, after instruction DEC BL is executed, CF =?
2.CH=80H; after ROL CH, 1; CH=?
Chapter 4 Solutions
Computer Systems: Program... -Access
Ch. 4.1 - Prob. 4.1PPCh. 4.1 - Prob. 4.2PPCh. 4.1 - Prob. 4.3PPCh. 4.1 - Prob. 4.4PPCh. 4.1 - Prob. 4.5PPCh. 4.1 - Prob. 4.6PPCh. 4.1 - Prob. 4.7PPCh. 4.1 - Prob. 4.8PPCh. 4.2 - Practice Problem 4.9 (solution page 484) Write an...Ch. 4.2 - Prob. 4.10PP
Ch. 4.2 - Prob. 4.11PPCh. 4.2 - Prob. 4.12PPCh. 4.3 - Prob. 4.13PPCh. 4.3 - Prob. 4.14PPCh. 4.3 - Prob. 4.15PPCh. 4.3 - Prob. 4.16PPCh. 4.3 - Prob. 4.17PPCh. 4.3 - Prob. 4.18PPCh. 4.3 - Prob. 4.19PPCh. 4.3 - Prob. 4.20PPCh. 4.3 - Prob. 4.21PPCh. 4.3 - Prob. 4.22PPCh. 4.3 - Prob. 4.23PPCh. 4.3 - Prob. 4.24PPCh. 4.3 - Prob. 4.25PPCh. 4.3 - Prob. 4.26PPCh. 4.3 - Prob. 4.27PPCh. 4.4 - Prob. 4.28PPCh. 4.4 - Prob. 4.29PPCh. 4.5 - Prob. 4.30PPCh. 4.5 - Prob. 4.31PPCh. 4.5 - Prob. 4.32PPCh. 4.5 - Prob. 4.33PPCh. 4.5 - Prob. 4.34PPCh. 4.5 - Prob. 4.35PPCh. 4.5 - Prob. 4.36PPCh. 4.5 - Prob. 4.37PPCh. 4.5 - Prob. 4.38PPCh. 4.5 - Prob. 4.39PPCh. 4.5 - Prob. 4.40PPCh. 4.5 - Prob. 4.41PPCh. 4.5 - Prob. 4.42PPCh. 4.5 - Prob. 4.43PPCh. 4.5 - Prob. 4.44PPCh. 4 - Prob. 4.45HWCh. 4 - Prob. 4.46HWCh. 4 - Prob. 4.47HWCh. 4 - Prob. 4.48HWCh. 4 - Modify the code you wrote for Problem 4.47 to...Ch. 4 - In Section 3.6.8, we saw that a common way to...Ch. 4 - Prob. 4.51HWCh. 4 - The file seq-full.hcl contains the HCL description...Ch. 4 - Prob. 4.53HWCh. 4 - The file pie=full. hcl contains a copy of the PIPE...Ch. 4 - Prob. 4.55HWCh. 4 - Prob. 4.56HWCh. 4 - Prob. 4.57HWCh. 4 - Our pipelined design is a bit unrealistic in that...Ch. 4 - Prob. 4.59HW
Knowledge Booster
Similar questions
- 4.18 [5] <COD §4.5> Assume that X1 is initialized to 11 and X2 is initialized to 22. Suppose you executed the code below on a version of the pipeline from COD Section 4.5 (An overview of pipelining) that does not handle data hazards (i.e., the programmer is responsible for addressing data hazards by inserting NOP instructions where necessary). What would the final values of registers X3 and X4 be? ADDI X1, X2, #5 ADD X3, X1, X2 ADDI X4, X1, #15arrow_forward1. In this exercise we examine in detail how an instruction is executed in a single cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following instruction word: 10001100101001100000000000111000 Assume that the data memory is all zeros and that the processor’s registers have the following values at the beginning of the cycle in which the above instruction word is fetched: R0 R1 R2 R3 R4 R5 R6 R8 R12 R3 1 0 1 -2 4 -6 4 -10 -12 -14 31 a. What are the outputs of the sign-extend and the jump “Shift-Left-2” (near the top of the following Figure) for this instruction word? b. What are the values of ALU control unit’s inputs (ALUOp and Instruction operation) for this instruction? c. For the ALU and the two add units, what are their data input values? ALU Add (PC+4) Add (Branch) Input#1 Input#2 Input#1 Input#2 Input#1 Input#2arrow_forward- Consider that each instruction require 6 steps (phases) to execute and each (step) phase takes 5 seconds to complete. Also, within each instruction it requires 3 seconds time gap between end of completion one step (phase) and beginning of next step (phase). b. Design and reflec on a suitable pipeline technique using which time taken for executing all 3 programs can be further improved. Calculate the improved time using this method by considering that instruction require same number of steps to complete and time taken for each step is same and time gap between each steps is also same?arrow_forward
- Section 1.0 cites as a pitfall the utilization of a subset of the performace equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions. (1) A common fallacy is to use MIPS to compare the performace of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. (2) Another common performace figure is MFLOPS, defined as MFLOPS = No. FP operations / (execution time x 1E6) but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the processors.arrow_forwardProblem 4: Give a block diagram for a 8M x 32 memory using 512K x 8 memory ch book] [Hints: Figure 5.10 in thearrow_forward4.19.16: [5] <COD §4.6>. In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies: Also, assume that instructions executed by the processor are broken down as follows: (a) What is the clock cycle time in a pipelined and non-pipelined processor? (b) What is the total latency of an lw instruction in a pipelined and non-pipelined processor? (c) If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? (d) Assuming there are no stalls or hazards, what is the utilization of the data memory? (e) Assuming there are no stalls or hazards, what is the utilization of the write-register port of the "Registers" unit? No hand written and fast answer with explanationarrow_forward
- Question 7: In the following instruction sequence, show the resulting value of AL where indicated, in Hexadecimal: a. mov al,7Ah not al ; . b. mov al,72h xor al,0DCh ; and also write instructions that first clear bit positions 0,1 and 2 in AL. Then, if the destination operand is equal to zero, the code should jump to label LABEL1. Otherwise, it should jump to label NEXTarrow_forwardli $t2, 2 L1: add $t1, $t1, $t2 sub $t1, $t1, $t3 bne $t1, $t4, L1 sub $t4, $s0, $t3 Given the modified single-cycle processor shown below, what are the values (in binary) of instruction[31-26], instruction[25-21], instruction[20-16], instruction[15-11], instruction[5-0], Read data 1, Read data 2, ALU zero, PCSrc, and all the main control decoded output signals when the time is at 1950 ns. The below single-cycle processor diagram can be used for your reference. Note: A new decoded signal output “Tzero” is added for executing “bne” instruction. The signal definition is described below: Instruction Opcode New Main Control Output Signal beq 00100b (4d) Tzero = 0 bne 00101b (5d) Tzero = 1 At the moment of 1950 ns, the below values (0, 1 or X) are:instruction[31-26] = instruction[25-21] = instruction[20-16] =instruction[15-0] = Read data 1 output = Read data 2 output = RegDst = ALUSrc = MemtoReg = RegWrite =…arrow_forwardDevelop the assembly code for the following: Load 10 Hex values in 10 memory location. Next find the sum of all the values loaded in the memory location. At the end Register A should contain low byte and Register R5 High byte of the final result.arrow_forward
- The importance of having a good branch predictor depends on how often conditional branches are executed. Together with branch predictor accuracy, this will determine how much time is spent stalling due to mispredicted branches. In this exercise, assume that the breakdown of dynamic instructions into various instruction categories is as follows: R-Type BEQ JMP LW SW 40% 25% 5% 25% 5% Also, assume the following branch predictor accuracies: Always - Taken Always - not - taken 2-bit 40% 60% 75% 1.1 Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used. 1.2 Repeat 1.1 for the “always-not-taken” predictor. 1.3 Repeat 1.1 for the 2-bit predictor. 1.4 With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way…arrow_forwardThe following assembly program contains a number of assembly-time errors, as indicated to the right. Correct each error (2 points credit each). .MODEL SMALL .STACK 64H .DATA DATA1 DB 25 DATA2 DB 280 ;1: Value out of range DATA3 DB ? .CODE MOV AX,DATA ; 2: Improper operand type MOV DS,AX MOV AX,DATA1 ;3: Operand types must match ADD AX,DATA2 ;4: Operand types must match MOV DATA3,AX MOV FX,4COOH ;5: Symbol not defined INT 21H ENDarrow_forwardhelp for the mips code. dont use AI, divi is not using in mips. Q1)Suppose $t1 stores the base address of word array A and $s2 is associated with h, convert to the following instruction into MIPS. if A[m+3]<20: A[m+1] = 5 else: A[m] = 1Q2) Assume only $s1, $s2, $a0, $v0 registers can be used. Procedure calling convention MUST be followed. def func(x): a = x/3 if a == 20: return a else: return 1arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education