Problem 1.8 The following code segment, consisting of six instructions, needs to b executed 64 times for the evaluation of vector arithmetic expression: D(I) = A(1) + B(1 xC(I) for 0 ≤I≤ 63. Load R1, B(I) /R1 Load R2, C(I) Multiply R1, R2 Load R3, A(I) Add R3, R1 Store D(I), R3 /R2 /R1 /R3 /R3 Memory (8 + 1)/ (R1) × (R2)/ Memory (7 + 1)/ (R3) + (R1)/ /Memory (0 + I) - (R3)/ Memory (a + 1)/ 1 where R1, R2, and R3 are CPU registers, (R1) is the content of R1, a,6,7, and are the starting memory addresses of arrays B(1), C(I), A(I), and D(I), respectively. Assume four clock cycles for each Load or Store, two cycles for the Add, and eight cycles for the Multiply on either a uniprocessor or a single PE in an SIMD machine. (a) Calculate the total number of CPU cycles needed to execute the above code seg- ment repeatedly 64 times on an SISD uniprocessor computer sequentially, ignoring all other time delays. (b) Consider the use of an SIMD computer with 64 PEs to execute the above vector operations in six synchronized vector instructions over 64-component vector data and both driven by the same-speed clock. Calculate the total execution time on the SIMD machine, ignoring instruction broadcast and other delays. (c) What is the speedup gain of the SIMD computer over the SISD computer?

C++ for Engineers and Scientists
4th Edition
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Bronson, Gary J.
Chapter2: Problem Solving Using C++using
Section2.3: Data Types
Problem 9E: (Practice) Although the total number of bytes varies from computer to computer, memory sizes of...
icon
Related questions
Question
Problem 1.8 The following code segment, consisting of six instructions, needs to be
executed 64 times for the evaluation of vector arithmetic expression: D(I) = A(I) + B(I)
xC(I) for 0 ≤ I ≤ 63.
Load R1, B(I)
/R1 - Memory (a + I)/
Load R2, C(I)
Multiply R1, R2
Load R3, A(I)
Add R3, R1
Store D(I), R3
t
/R2 Memory (8 + 1)/
/R1 - (R1) × (R2)/
/R3
-
Memory (7 + I)/
-
/R3 (R3) + (R1)/
/Memory (0 + I) ← (R3)/
where R1, R2, and R3 are CPU registers, (R1) is the content of R1, a, ß,7, and are
the starting memory addresses of arrays B(1), C(I), A(I), and D(I), respectively. Assume
four clock cycles for each Load or Store, two cycles for the Add, and eight cycles for the
Multiply on either a uniprocessor or a single PE in an SIMD machine.
(a) Calculate the total
ber of CPU cycles needed to execute the above code seg-
ment repeatedly 64 times on an SISD uniprocessor computer sequentially, ignoring
all other time delays.
(b) Consider the use of an SIMD computer with 64 PEs to execute the above vector
operations in six synchronized vector instructions over 64-component vector data
and both driven by the same-speed clock. Calculate the total execution time on
the SIMD machine, ignoring instruction broadcast and other delays.
(c) What is the speedup gain of the SIMD computer over the SISD computer?
Transcribed Image Text:Problem 1.8 The following code segment, consisting of six instructions, needs to be executed 64 times for the evaluation of vector arithmetic expression: D(I) = A(I) + B(I) xC(I) for 0 ≤ I ≤ 63. Load R1, B(I) /R1 - Memory (a + I)/ Load R2, C(I) Multiply R1, R2 Load R3, A(I) Add R3, R1 Store D(I), R3 t /R2 Memory (8 + 1)/ /R1 - (R1) × (R2)/ /R3 - Memory (7 + I)/ - /R3 (R3) + (R1)/ /Memory (0 + I) ← (R3)/ where R1, R2, and R3 are CPU registers, (R1) is the content of R1, a, ß,7, and are the starting memory addresses of arrays B(1), C(I), A(I), and D(I), respectively. Assume four clock cycles for each Load or Store, two cycles for the Add, and eight cycles for the Multiply on either a uniprocessor or a single PE in an SIMD machine. (a) Calculate the total ber of CPU cycles needed to execute the above code seg- ment repeatedly 64 times on an SISD uniprocessor computer sequentially, ignoring all other time delays. (b) Consider the use of an SIMD computer with 64 PEs to execute the above vector operations in six synchronized vector instructions over 64-component vector data and both driven by the same-speed clock. Calculate the total execution time on the SIMD machine, ignoring instruction broadcast and other delays. (c) What is the speedup gain of the SIMD computer over the SISD computer?
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Pipelining
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
C++ for Engineers and Scientists
C++ for Engineers and Scientists
Computer Science
ISBN:
9781133187844
Author:
Bronson, Gary J.
Publisher:
Course Technology Ptr