CIS655_Assignment3_Vaibhav_Rajani

.pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

655

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

6

Uploaded by MajorRock8494

Report
CIS655: Computer Architecture Assignment 3 Name: Vaibhav Rajani SUID: 220733034 Question 1: Explain why branch instructions in GPU code can be slow. Answer 1: Branch instructions in GPU code can be slow due to several reasons: i. Divergence: GPUs are designed for parallel processing and SIMD (Single Instruction, Multiple Data) execution, where multiple threads in a thread group or warp (a group of threads) execute the same instruction on different data elements simultaneously. However, when a branch instruction is encountered, threads within the same thread group or warp may take different paths, resulting in divergence. This can cause some threads to be idle, waiting for other threads to complete their execution, leading to reduced throughput and performance. ii. Instruction Pipelining: GPUs utilize instruction pipelining, where instructions are fetched, decoded, executed, and retired in multiple stages of a pipeline. However, branch instructions can disrupt the smooth flow of instructions in the pipeline. When a branch instruction is encountered, the processor may need to flush the instructions that are already fetched or in the pipeline, and then fetch and decode the instructions in the new branch target, causing stalls in the pipeline and reducing overall instruction throughput. iii. Branch Mispredictions: Branch instructions rely on predictions to determine the outcome of the branch, such as whether it is taken or not taken. However, if the prediction is incorrect, it results in a branch misprediction. When a mispredicted branch is encountered, the processor needs to discard the instructions that were fetched and decoded based on the incorrect prediction, and fetch and decode the instructions in the correct branch target. This can cause significant delays and reduce instruction throughput. iv. Limited Instruction Cache: GPUs typically have limited instruction cache compared to their data cache. Branch instructions can increase the pressure on the instruction cache, as they may require fetching and storing multiple branch target instructions, increasing the likelihood of instruction cache misses. Instruction cache misses can cause delays in instruction fetch and decode, leading to reduced performance. v. Control Flow Overhead: Branch instructions can introduce additional control flow overhead in GPU code, as they may require managing and synchronizing threads within a thread group or warp that take different paths. This can involve additional synchronization, branching, and bookkeeping operations, which can add overhead and reduce overall performance.
Question 2: Explain the difference between the Single-instruction multiple-thread (SIMT) programming model of GPUs and the single-instruction multiple-data (SIMD) model used in CPUs. Answer 2: Feature SIMT (GPU) SIMD (CPU) Execution Path Threads within a thread group or warp can execute different instructions, allowing for divergent execution paths. All threads execute the same instruction simultaneously on different data elements. Thread Scheduling Designed for better utilization of thread resources, allowing threads to diverge and follow different control flow paths. All threads are tightly synchronized and execute in lockstep. Thread Flexibility Provides finer-grained thread synchronization and allows threads to access different memory locations, enabling more complex memory access patterns. Limited thread synchronization and all threads access the same memory locations. Memory Access Allows threads to access different memory locations, enabling more diverse and parallel computation. All threads access the same memory locations. Parallelism Extends parallelism from data-level to thread-level, enabling more diverse and parallel computation. Parallelism is limited to data-level only. Application Suitable for massive parallel processing requirements of GPUs, optimized for graphics and general-purpose computation. Suitable for general-purpose computation in CPUs, optimized for single-threaded performance.
The Single-Instruction Multiple-Thread (SIMT) programming model used in GPUs enables more flexibility and divergence in execution paths, provides finer-grained thread synchronization, and allows threads to access different memory locations. This makes it well- suited for massive parallel processing requirements. In contrast, the Single-Instruction Multiple-Data (SIMD) model used in CPUs prioritizes single-threaded performance, with all threads executing the same instruction simultaneously on different data elements, accessing the same memory locations, and limited thread synchronization.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: Telephone Numbers In the past, a local telephone number in a country consisted of a sequence of four…
Q: 2) a) Differentiate each function below. Be sure to use proper distinguish between a function and…
Q: Find the indefinite integral. Check your result by differentiating. (Use C for the constant of…
Q: The College Board provided comparisons of Scholastic Aptitude Test (SAT) scores based on the highest…
Q: An unknown substance has a mass of 25.9 g. The temperature of the substance increases from 28.7 °C…
Q: intracellular
Q: Consider the function. Find f'(x). f'(x) = Find F"(x). f"(x) = f(x) = x³6x² +9 Find f"(-3) and…
Q: A simple pendulum positioned on the surface of the Earth is shown to have a period of 10.0 seconds…
Q: The monthly demand function for x units of a product sold by a monopoly is  p = 6400 − 1/2x2…
Q: Calculate ∬Sf(x,y,z)dS For x^2+y^2=9,0≤z≤3;f(x,y,z)=e^(−z) ∬Sf(x,y,z)dS= ?
Q: The two most controversial questions were No. 27 and 28, which many internees answered “no-no.” Why…
Q: Fxx is correct but Fxy is not 0.
Q: Find the arc length of the graph of y = (x3/6) + (1/2)x on the interval [ 1/2 , 2], as shown
Q: Element X has a​ half-life of 11 days.If we currently have a 27 ounce sample of Element​ X, how long…
Q: A Person with a confirmed blood pressure of 125/87 would be classified a 1)  prehypertension 2)…
Q: 3. Given h(x) = 3 ln(1 + x²) + 8 arctan(x), find all x where h"(x) = 0.
Q: In the short-run, if the marginal cost of a firm in a competitive industry is increasing while its…
Q: sing the general properties of equilibrium cor a certain temperature, the equilibrium constant for…
Q: The balanced combustion reaction for CH is 2 C H₂(1) + 15 O₂(g) - 12 CO₂(g) + 6H₂O(1) + 6542 kJ If…
Q: The right circular cone rotates about the z axis at a constant rate of w₁ = 3.2 rad/s without…
Q: Sketch the graph of the given function. Check your sketch using technology. g(x) = x - 27x, domain…
Q: The local utility company surveys 11 randomly selected customers. For each survey participant, the…