Concept explainers
Write a version of the inner product procedure described in Problem 5.13 that uses 6 × 1 loop unrolling. For x86-64, our measurements of the unrolled version give a CPE of 1.07 for integer data but still 3.01 for both floating-point data.
- A. Explain why any (scalar) version of an inner product procedure running on an Intel Core i7 Haswell processor cannot achieve a CPE less than 1.00.
- B. Explain why the performance for floating-point data did not improve with loop unrolling.
Want to see the full answer?
Check out a sample textbook solutionChapter 5 Solutions
Computer Systems: A Programmer's Perspective Plus Mastering Engineering With Pearson Etext -- Access Card Package (3rd Edition)
Additional Engineering Textbook Solutions
Java How to Program, Early Objects (11th Edition) (Deitel: How to Program)
Software Engineering (10th Edition)
Starting Out with C++ from Control Structures to Objects (9th Edition)
Starting Out with Java: From Control Structures through Data Structures (4th Edition) (What's New in Computer Science)
Web Development and Design Foundations with HTML5 (8th Edition)
Starting Out with Programming Logic and Design (5th Edition) (What's New in Computer Science)
- IN x86 assembly, Provide a scrrenshot of the codes resultarrow_forwardThis problem is adapted from an earlier edition of P&H, and should be submitted.Consider the following code used to implement a new instruction: foo $t3,$t1,$t2:mask : . word 0xFFFFF83Fs t a r t : l a $t0 , masklw $t0 , 0 ( $ t 0 )l a $t3 , s h f t rlw $t3 , 0 ( $ t 3 )and $t3 , $t3 , $ t 0a ndi $t2 , $t2 , 0 x 0 0 1 fs l l $t2 , $t2 , 6o r $t3 , $t3 , $ t 2l a $t5 , s h f t rsw $t3 , 0 ( $ t 5 )s h f t r : s l l $t3 , $t1 , 0Add meaningful comments to the code and write a brief (2 sentence max) description of what foo does. Thisis not the same as saying how it does it - e.g., when asked to describe what a pedestrian is doing, you wouldsay they are walking, not that they are ilfting their left leg, angling it forward, putting it down, . . ..State at least one reason why writing “self-modifying code” such as this is a bad idea (and often times notactually allowed by the operating system)?arrow_forwardDescribe the implementation of the TestandSet instruction. Show how the followingalgorithm using TestandSet does not satisfy the bounded wait requirement?Shared data: boolean lock = false;Process Pi:do {while (TestAndSet(lock)) ;critical sectionlock = false;remainder section}arrow_forward
- Implement the following pseudo-code in assembly language. All values are unsigned: bx = 5 cx = 8 dx = 4 ax = 0 If (bx <= cx) AND (cx > dx ) { ax = 5; dx = 10; } ; Display values of ax and dx using emu8086arrow_forwardSuppose that we have an atomic test-and-set-lock instruction that atomically copies val to old_val, and sets val to 1. void test_and_set(int* old_val, int* val); Write the implementations for the following functions of a Spinlock. void acquire(int* lock) { // your code } void release(int* lock) { // your code } PROVIDE CODE IN Carrow_forwardWhat happens if VA page 30 is written even if an instruction was not accepted? An instance of a software-managed TLB would outperform a hardware-managed TLB in the following cases:arrow_forward
- For the following assembly code that is vaguely MIPS-like, trace its execution in both the R3000 and R4000 pipelines using a Gantt chart (or table of some kind).LOAD R1, Memory(12340)LOAD R2, Memory(12350)ADD R3, R1, R2 // R3 = R1 + R2ADD R3, R3, R3 // R3 = R3 + R3STORE R3, Memory(12360) I know other students have posted similar questions and they are answered, but those answers do not seem correct to me, so please do not copy from them.arrow_forwardSuppose that the execution time of a given program on a single processor is 9.5 ms. If you run this program on a system having 18 independent processors, you observe an overall speedup of 1.1. Using Amdahl’s Law, what must have been the fraction of the program that could be executed in parallel? (Express the answer in decimal form, i.e., if 50% of the program can be enhanced, enter 0.5. Round to three decimal places)arrow_forwardIn this exercise we compare the performance of 1-issue and 2-issue processors, taking into account program transformations that can be made to optimize for 2-issue execution. Problems in this exercise refer to the following loop(written in C):for(i=0;i!=j;i+=2)b[i]=a[i]–a[i+1];When writing MIPS code, assume that variables are kept in registers as follows, and that all registers except those indicated as Free are used to keep various variables, so they cannot be used for anything else. i j a b c Free R5 R6 R1 R2 R3 R10,R11,R12 Translate this C code into MIPS instructions. Your translation should be direct, without rearranging instructions to achieve better performance.arrow_forward
- For our simple 4-stage RISC pipeline, we run a particular benchmark program and find that it has 19% conditional branches. An additional 4% are unconditional branches. What is the maximum number of the conditional branches that could be taken if we want to keep the CPI to 1.10 or less? what is the minimum percentage of branch delay slots that need to be filled with useful instructions in order to keep the CPI to 1.0135? Assume that f2n= 0.25 ×f1n. In other words, whatever percentage of instructions need justa single NOP, there is another 25% of that requiring two NOPs.arrow_forwardShow examples of C code that explicitly do a read-modify-write operation on SAMD21 I/O pin PA02 to setthis bit in a PORT register, clear it in a PORT register, and toggle it in a PORT register. You may assumeit has already been initialized as an output. If possible, show an example of code that implicitlydoes a read-modify-write operation on this pin.arrow_forwardFor the following assembly code that is vaguely MIPS-like, trace itsexecution in both the R3000 and R4000 pipelines using a Gantt chart (ortable of some kind).LOAD R1, Memory(12340)LOAD R2, Memory(12350)ADD R3, R1, R2 // R3 = R1 + R2ADD R3, R3, R3 // R3 = R3 + R3STORE R3, Memory(12360)arrow_forward
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education