11. OoO-IV & Recap

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

4290

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

61

Uploaded by DeanStar22394

Report
CS4290/CS6290/ECE4100/ECE6100 Advanced Computer Architecture Fall 2023 Alexandros (Alex) Daglis School of Computer Science Georgia Institute of Technology alexandros.daglis@cc.gatech.edu Lecture 11: Out of Order Execution IV Lecture slides adapted from Arvind, J. Emer, K. Asanovic, T. Krishna
Schedule Today: § Out of order execution 1. Hazards & Renaming 2. Tomasulo’s algorithm 3. ROB & Speculation 4. Load-Store Queue § Recap Reminder: § Midterm I on Thursday 2 M T W T F 21-Aug 22-Aug 23-Aug 24-Aug 25-Aug 28-Aug 29-Aug 30-Aug 31-Aug 1-Sep 4-Sep 5-Sep 6-Sep 7-Sep 8-Sep 11-Sep 12-Sep 13-Sep 14-Sep 15-Sep 18-Sep 19-Sep 20-Sep 21-Sep 22-Sep 25-Sep 26-Sep 27-Sep 28-Sep 29-Sep 2-Oct 3-Oct 4-Oct 5-Oct 6-Oct 9-Oct 10-Oct 11-Oct 12-Oct 13-Oct 16-Oct 17-Oct 18-Oct 19-Oct 20-Oct 23-Oct 24-Oct 25-Oct 26-Oct 27-Oct 30-Oct 31-Oct 1-Nov 2-Nov 3-Nov 6-Nov 7-Nov 8-Nov 9-Nov 10-Nov 13-Nov 14-Nov 15-Nov 16-Nov 17-Nov 20-Nov 21-Nov 22-Nov 23-Nov 24-Nov 27-Nov 28-Nov 29-Nov 30-Nov 1-Dec 4-Dec 5-Dec 6-Dec 7-Dec 8-Dec 11-Dec 12-Dec 13-Dec 14-Dec 15-Dec
Out-of-Order Execution with ROB Adder Memory Unit Mult Data Addr Reorder Buffer Common Data Bus (CDB) [P, result] Physical Register File Fetch Decode Issue from iCache Decode Buffer Instruction Buffer Functional Units Execute Writeback In-order Out-of- Order / Dataflow Commit In-order use exec op p1 PR1 p2 PR2 Rd LPRd PRd ex? 3 v tag r 1 r 2 . . r n RAT P 0 P 1 P 2 . . . P n Free List * LPRd: Last Physical Register of destination
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Physical Register Management op p1 PR1 p2 PR2 ex use Rd PRd LPRd ROB ld r1, 0(r3) add r3, r1, #4 sub r6, r7, r6 add r3, r3, r6 ld r6, 0(r1) Free List P0 P1 P3 P2 P4 100 P5 50 P6 60 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p 3 P8 p x ld p P7 r1 P0 R5 P5 R6 P6 R7 R0 P8 R1 R2 P7 R3 R4 Register Alias Table P0 P8 P7 P1 x add P0 r3 P1 P5 P3 x sub p P6 p P5 r6 P3 P1 P2 x add P1 P3 r3 P2 x ld P0 r6 P3 P4 P4 4
Lifetime of Physical Registers 5 a) ld r1 , (r3) b) add r3 , r1, #4 c) sub r1 , r3, r9 d) add r3 , r1, r7 e) ld r6 , (r1) f) add r8 , r6, r3 g) st r8 , (r1) h) ld r3 , (r11) ld P1, (P x ) add P2, P1, #4 sub P3, P 2 , P y add P4, P3, Pz ld P5, (P3) add P6, P5, P4 st P6, (P3) ld P7, (P w ) Rename When can we reuse a physical register? When next write of same architectural register commits § Physical regfile holds both committed and speculative values § Physical registers decoupled from ROB entries (no data in ROB)
op p1 PR1 p2 PR2 ex use Rd PRd LPRd ROB x ld p P7 r1 P0 x add P0 r3 P1 x sub p P6 p P5 r6 P3 x ld p P7 r1 P0 Physical Register Management ld r1, 0(r3) add r3, r1, #4 sub r6, r7, r6 add r3, r3, r6 ld r6, 0(r1) Free List 100 P5 50 P6 60 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p 3 P8 p R5 P4 R6 P6 R7 R0 P0 R1 R2 P2 R3 R4 Register Alias Table P8 P7 P5 P1 x add P1 P3 r3 P2 x ld P0 r6 P4 P3 p p p 1234 P8 x 6 Writeback [1234, P0] Commit exp? Execute
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
op p1 PR1 p2 PR2 ex use Rd PRd LPRd ROB x sub p P6 p P5 r6 P3 x add P0 r3 P1 Physical Register Management ld r1, 0(r3) add r3, r1, #4 sub r6, r7, r6 add r3, r3, r6 ld r6, 0(r1) Free List 100 P5 50 P6 60 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 x x ld p P7 r1 P0 P8 P7 P5 P1 x add P1 P3 r3 P2 x ld P0 r6 P4 P3 p p p 1234 P8 x 7 exp? Execute R5 P4 R6 P6 R7 R0 P0 R1 R2 P2 R3 R4 Register Alias Table x x Execute Execute
op p1 PR1 p2 PR2 ex use Rd PRd LPRd ROB x sub p P6 p P5 r6 P3 x add P0 r3 P1 x add P0 r3 P1 Physical Register Management ld r1, 0(r3) add r3, r1, #4 sub r6, r7, r6 add r3, r3, r6 ld r6, 0(r1) Free List 100 P5 50 P6 60 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 x x ld p P7 r1 P0 P8 P7 P5 P1 x add P1 P3 r3 P2 x ld P0 r6 P4 P3 p p p 1234 P8 x p p 1238 P7 8 exp? Writeback [1238, P1] Commit R5 P4 R6 P6 R7 R0 P0 R1 R2 P2 R3 R4 Register Alias Table x x
op p1 PR1 p2 PR2 ex use Rd PRd LPRd ROB x sub p P6 p P5 r6 P3 x add P0 r3 P1 x add P0 r3 P1 Physical Register Management ld r1, 0(r3) add r3, r1, #4 sub r6, r7, r6 add r3, r3, r6 ld r6, 0(r1) 100 P5 50 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p P8 x x ld p P7 r1 P0 P8 P7 P5 P1 x add P1 P3 r3 P2 x ld P0 r6 P4 P3 p p p 1234 x p p 1238 9 x x Writeback [68, P4] x ld P0 r6 P4 p 68 exp? R5 P4 R6 P6 R7 R0 P0 R1 R2 P2 R3 R4 Register Alias Table Free List P8 P7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
op p1 PR1 p2 PR2 ex use Rd PRd LPRd ROB x sub p P6 p P5 r6 P3 x add P0 r3 P1 x add P0 r3 P1 Physical Register Management ld r1, 0(r3) add r3, r1, #4 sub r6, r7, r6 add r3, r3, r6 ld r6, 0(r1) 100 P5 50 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p P8 x x ld p P7 r1 P0 P8 P7 P5 P1 x add P1 P3 r3 P2 x ld P0 r6 P4 P3 p p p 1234 x p p 1238 10 x x p 68 exp? Exception! x R5 P4 R6 P6 R7 R0 P0 R1 R2 P2 R3 R4 Register Alias Table P3 Free List P8 P7 P4 P2 P1 P3 P5 Exception Handler PC will be fetched and executed next
Out-of-Order Execution + In-Order Commit 11 Fetch Decode Execute Commit PC Complete Issue Reorder Buffer WB In-order Out-of-Order In-order
Exceptions 12 Fetch Decode Execute Commit Kill Kill Kill Exception Inject correct PC PC Complete Issue Reorder Buffer WB In-order Out-of-Order In-order
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Branch Mispredictions 13 Fetch Decode Execute Commit Kill Kill Kill Branch Resolution Inject correct PC Branch Prediction PC Complete Issue Reorder Buffer WB In-order Out-of-Order In-order Kill
ROB holds Active Instruction Window (Decoded but not Committed) 14 ld r1, (r3) add r3, r1, r2 sub r6, r7, r9 add r3, r3, r6 ld r6, (r1) add r6, r6, r3 st r6, (r1) ld r6, (r1) (Older instructions) (Newer instructions) Commit Fetch Execute ld r1, (r3) add r3, r1, r2 sub r6, r7, r9 add r3, r3, r6 ld r6, (r1) add r6, r6, r3 st r6, (r1) ld r6, (r1) Cycle t Cycle t + 1
Speculative Store Queue/Buffer We’ve focused on register dependencies so far § But dependencies also exist between memory operations Just like register updates, stores should not modify the memory until after the instruction is committed § Otherwise, if we need to roll back, would have to undo changes in memory! § A speculative store queue is a structure introduced to hold speculative store data 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Speculative Store Queue 16 On store Issue § store queue slot allocated in program order On store Execute § slot marked valid and speculative; address, data, and instruction PC stored On store Commit (store is the oldest instruction in ROB and both address and data are available) § clear speculative bit and eventually move data to cache § free respective ROB entry On store abort § clear valid bit L1 Data Cache Store Instruction PC Addr S V Data
Load Bypass 17 Data Load Address Tag Speculative Store Queue L1 Data Cache Load Data If data in both store queue and cache, which should we use? Youngest store older than load store queue If same address in store queue twice, which should we use? PC Addr S V Data
Memory Dependencies 18 SW r1, (r2) LW r3, (r4) When can we execute the load?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Load Store Queue (LSQ) Memory instructions are allocated into LSQ in program order § LSQ manages memory reference ordering Implementation Choice: Unified LSQ vs. Split LSQ § Intel Sandy Bridge: 64 Load buffers, 36 Store buffers § Cascade Lake: 72 Load buffers, 56 Store buffers 19 Store Queue Load Queue Age-ordered Split LSQ Unified LSQ
Conservative OoO Load Execution Can execute load before store, if addresses known and r4 != r2 § Stall if any previous store address (e.g., r2) not known § Each load address compared with addresses of all previous uncommitted stores 20 SW r1, (r2) LW r3, (r4)
Address Speculation Speculate that r4 != r2 Execute load before store address known Need to hold all completed but uncommitted load/store addresses in program order If subsequently find r4==r2 , squash load and all following instructions § Large penalty for inaccurate address speculation 21 SW r1, (r2) LW r3, (r4)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: Load 0 A 1 2 D 0 Exec? order address 2 C 0 Issued to Memory for execution Exec? order address 3 ??? 0 data FFFFFF00 Each load checks against older stores § Associative search § A performance issue of scalability Store Queue Load Queue 22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: Load 23 order address 0 A 1 3 ??? 0 2 D 1 order address Store Queue Load Queue 2 C 0 Store-to-load forwarding data FFFFFF00 Exec?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: Load 24 order address 0 A 1 2 D 1 order address 2 C 1 data 4 K 0 Speculatively execute Exec? Exec? Store Queue Load Queue
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: Load 25 order address 1 A 1 1 B 1 0 A 1 1 C 1 3 0 2 D 1 order address 2 C 1 00000001 12340000 FFFF1111 data FFFFFF00 Exec? Exec? Store Queue Load Queue ???
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: Load 26 order address 1 A 1 1 B 1 0 A 1 1 C 1 3 K 1 2 D 1 order address 2 C 1 00000001 12340000 FFFF1111 data FFFFFF00 Exec? Exec? Store Queue Load Queue Store Checks for younger Loads with same address (Assoc. Search) Conflict detected! “Replay” instructions since misspeculated load
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Ordering between Loads and Stores Can loads and stores to two different addresses be executed out-of-order? § Not always! We will talk about this in detail when we talk about “Memory Consistency” (last topic of the semester) 28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Putting it all together Adder Mult Data Addr Reorder Buffer Common Data Bus (CDB) [P, result] Physical Register File Fetch Decode Issue from iCache Decode Buffer Instruction Buffer Functional Units Execute Writeback In-order Out-of- Order / Dataflow Commit In-order use exec op p1 PR1 p2 PR2 Rd LPRd PRd ex? 29 v tag r 1 r 2 . . r n RAT P 0 P 1 P 2 . . . P n Free List BHT BTB LSQ D$ Branch Predictor
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Speculative Execution Recipe 30 3. In event of mis-speculation dispose of all new values, restore old values and re- execute from point before mis-speculation 3. After ensuring there was no mis-speculation and there will be no more uses of the old values then discard old values and just use new values. OR 1. Proceed ahead despite unresolved dependencies using a prediction for an architectural or micro-architectural value 2. Maintain both old and new values on updates to architectural (and often micro-architectural) state. Why might one use old values? OoO WAR hazards
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Value Management Strategies 31 Eager Update: ¡ Update value in place, and ¡ Maintain a log of old values to use for recovery Lazy Update: ¡ Buffer new value leaving old value in place ¡ Replace old value only at ‘commit’ time Why leave an old value in place? § Old value can be used after new value is generated § Simplified recovery
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Exception Handling (In-Order Five-Stage Pipeline) 32 Strategy for PC? Strategy for Registers? Eager – update immediately Lazy – update at commit Asynchronous Interrupts PC Inst. Mem D Decode E M Data Mem W + Kill Writeback Select Handler PC Commit Point Illegal Opcode Overflow Data Addr Except PC Address Exceptions Exc D PC D Exc E PC E Exc M PC M Cause EPC Kill D Stage Kill F Stage Kill E Stage
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In-Order vs. Out-of-Order Branch Prediction 33 Fetch Decode Execute Commit In-Order Execution Out-of-Order Execution Fetch Decode Execute Commit ROB Br. Pred. Resolve Br. Pred. Resolve ¡ Speculative fetch but not speculative execution - branch resolves before later instructions complete ¡ Completed values held in pipeline latches until commit § Speculative execution, with branches resolved after later instructions complete § Completed values held in ROB or unified physical register file until commit § Both styles of machine can use same branch predictors in front-end fetch pipeline, and both can execute multiple instructions per cycle § Common to have 10-30 pipeline stages in either style of design In-Order In-Order In-Order Out-of-Order
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Register Alias Table Snapshots 34 T20 R0 R1 R2 R3 T54 R30 T88 R31 X Reg Map X T128 T73 X T20 T45 T54 T88 X Snap Map T73 X X T128 X T20 T08 T45 T128 T54 T88 X Snap Map X X X What kind of value management is this? Snapshot of RAT upon every branch prediction Eager!! T100
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OoO Design Choices Tags (Reservation Stations) Distributed (per FU) Centralized (separate or part of ROB) Data Data in ROB + Separate ARF Data in Unified Physical Register File (Less data movement) 35 (only tags in ROB) (More data movement) Eager or Lazy? Lazy Eager
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Load-Store Queue 36 Data Load Address Tag Speculative Store Queue L1 Data Cache Load Data If data in both store queue and cache, which should we use? Youngest store older than load store queue If same address in store queue twice, which should we use? Addr S V Data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Data Load Address Tag Speculative Store Queue L1 Data Cache Load Data Memory Update Strategy for Store Instructions A. Eager B. Lazy Session ID: cs6290 Eager or Lazy? Addr S V Data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Update Strategy for Store Instructions Eager or Lazy? Lazy § The speculative store queue is a structure introduced to hold speculative store data for a lazy update of the data cache 38
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Agenda § Quick recap of Module 1 § One big Tomasulo exercise
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Single-Cycle Harvard Machine 0x4 RegWrite Add Add clk WBSrc MemWrite addr wdata rdata Data Memory we RegDst BSrc ExtSel OpCode z OpSel clk zero? clk addr inst Inst. Memory PC rd1 GPRs rs1 rs2 ws wd rd2 we Imm Ext ALU ALU Control 31 PCSrc br rind jabs pc+4 No stalls. But cycle time very large 41
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2-Stage Princeton Machine IR 0x4 clk RegDst PCSrc RegWrite BSrc zero? WBSrc 31 ExtSel OpCode Add rd1 GPRs rs1 rs2 ws wd rd2 we Imm Ext addr wdata rdata Data Memory z ALU Add OpSel ALU Control clk we MemWrite clk PC MAddrSrc clk nop IRSrc PCSrc2 stall? stall LW/SW stall since one memory BEQZ, J, JAL, JR, JALR, stall since correct PC not known 42
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5-Stage Harvard Machine C dest IR IR IR PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR Imm Ext ALU rd1 GPRs rs1 rs2 ws wd rd2 we addr wdata rdata Data Memory we 31 nop stall C stall ws rs rt ? we re1 re2 C re ws we ws C dest C dest we Stall whenever src1 or src2 in ID is same as dest in EX/MA/WB Assuming no memory hazards 43
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Simplified 5-Stage Pipeline (Lab 2) WB on Falling Edge and Reg. Read on Rising Edge à No need to Stall until WB finishes. For branches, stall whenever special “condition status” register set by previous instruction, i.e., cc_read in ID == 1, cc_write in EX/MA == 1 Stall whenever src1 or src2 in ID is same as dest in EX/MA Assuming no memory hazards 44
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Key Design Principles Dependencies § Data: RAW, WAR, WAW § Control: Branch directions and targets How to handle dependencies? § Stall § Wait until dependency resolves § Bypass § Forward the required value if it exists somewhere in the pipeline, just not at the right place § Speculate § Assume a certain outcome of the dependency and continue execution § Rollback if assumed outcome != resolved outcome § Find something else to do 45
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5-Stage Harvard Machine (with Bypass) ASrc IR IR IR PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR ALU Imm Ext rd1 GPRs rs1 rs2 ws wd rd2 we addr wdata rdata Data Memory we 31 nop stall D E M W PC for JAL, ... BSrc Instruction in Decode having RAW with Load instruction ahead of it stalls, since output of Load only known after M stage 46
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
8-Stage Pipeline with Branch Prediction (Speculation) Fetch Decode Pre-Issue Issue RF Read Int EX FP EX Mem RF Write Branch Target Address Known Branch Direction Known Fetches PC+4 unless redirected BHT BHT BTB BHT #A (no BHT) #C #B #D Stall cycles/work lost depend on where BHT and BTB are, and where branch target and branch direction become known 47
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Out-of-Order Execution with Tomasulo (find something else to do) use exec op p1 src1 p2 src2 t 1 t 2 . . t n tags Reservation Stations handle which instructions stall and which can execute F D E E E E E E E E E E E E E E E E E E E E E . . . Integer add Integer mul FP mul Load/store W In-order Out-of-Order I 48
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Out-of-Order Execution + In-Order Commit 49 Fetch Decode Execute Commit PC Complete Issue Reorder Buffer WB In-order Out-of-Order In-order
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Exceptions 50 Fetch Decode Execute Commit Kill Kill Kill Exception Inject correct PC PC Complete Issue Reorder Buffer WB In-order Out-of-Order In-order
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Branch Mispredictions 51 Fetch Decode Execute Commit Kill Kill Kill Branch Resolution Inject correct PC Branch Prediction PC Complete Issue Reorder Buffer WB In-order Out-of-Order In-order Kill
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Don’t forget your laws and performance principles! § Iron Law § Performance = instr/program * CPI * time/cycle § Amdahl’s Law § Overall speedup limited by fraction of program that gets accelerated § Little’s Law § Max throughput = ops in flight / latency per op à need enough registers § Flynn’s bottleneck § “Can’t get our more than you can get in” à need for superscalar
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ILP != IPC 53 ILP is an attribute of the program § also depends on the ISA, compiler § SIMD, FMAC, etc. can change instruction count and shape of dataflow graph IPC depends on the actual machine implementation § ILP is an upper bound on IPC § achievable IPC depends on instruction latencies, cache hit rates, branch prediction rates, structural conflicts, instruction window size, etc.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ILP and IPC 54 I1 ADD R1, R2, R3 I2 SUB R4, R1, R5 I3 XOR R6, R7, R8 I4 MUL R5, R8, R9 I5 ADD R4, R8, R9 Processor C: Superscalar OoO 1 Mul, 2 ALU (ADD/SUB/XOR) Processor B: Superscalar OoO 1 Mul, 1 ALU (ADD/SUB/XOR) ILP ? ILP = (4+1)/2 = 2.5 IPC ? 5/4 = 1.25 IPC ? 5/2 = 2.5 Processor A: Scalar In-Order IPC ? 5/5 = 1 Best Case: Avg Instr. Issued in Parallel Per Cycle Total Cycles
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Larger ILP Example i1: r2 = 4(r22) i2: r10 = 4(r25) i3: r10 = r2 + r10 i4: 4(r26) = r10 i5: r14 = 8(r27) i6: r6 = (r22) i7: r5 = (r23) i8: r5 = r6 – r5 i9: r4 = r14 * r5 i10: r15 = 12(r27) i11: r7 = 4(r22) i12: r8 = 4(r23) i13: r8 = r7 – r8 i14: r8 = r15* r8 i15: r8 = r4 – r8 i16: (r28) = r8 i1 i2 i3 i4 i6 i7 i8 i5 i9 i11 i12 i13 i10 i14 i15 i16 Data Flow Graph (or Data Dependency Graph) 55 Program Order PC
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Larger ILP Example ILP Computation: Cycle 1: i1, i2, i5, i6, i7, i10, i11, i12 Cycle 2: i3, i8, i13 Cycle 3: i4, i9, i14 Cycle 4: i15 Cycle 5: i16 ILP: 16/5 = 3.2 (or simply, total nodes / critical path in data dependency graph) i1 i2 i3 i4 i6 i7 i8 i5 i9 i11 i12 i13 i10 i14 i15 i16 56 ILP = Best case IPC: Avg Instr. that COULD be issued in Parallel Per Cycle
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OoO Design Choices Tags (Reservation Stations) Distributed (per FU) Centralized (separate or part of ROB) Data Data in ROB + Separate ARF Data in Unified Physical Register File (HP PA8000, Pentium Pro, Core2Duo, Nehalem) Original Tomasulo Modern Processors (for In-Order Commit) 57 (only tags in ROB) Lecture 9 Lecture 10 MIPS R10K, Alpha 21264, Intel Pentium 4 & Sandy/Ivy Bridge
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OoO Practice Problem 58 Index = PC[4:2] ^ GHR[2:0] GHR Update: Eager (on prediction*) BHT Update: Lazy (on commit) Snapshot of RAT on every prediction* *recover if misprediction/exception Prediction happens in fetch stage Assume gshare branch predictor
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example(1) Prediction Counters Index Before After 000 00 001 01 010 01 011 11 100 00 101 11 110 10 111 01 Instruction in Fetch Stage PC Instr Instruction in Decode Stage Next PC to be fetched Before After 0x20D8 Register Alias Table (latest) Name Before After R1 P6 R2 P2 R3 P7 R4 P5 RAT (Snapshot 1) Valid: 1 Name Before After R1 P1 R2 P2 R3 P3 R4 P5 RAT (Snapshot 2) Valid: 0 Name Before After R1 R2 R3 R4 Physical Registers P0 P1 2 p P2 8000 p P3 5 p P4 20 p P5 18 p P6 0 p P7 P8 P9 Free List P8 P9 P0 Reorder Buffer (ROB) use ex op p1 PR1 p2 PR2 Rd LPRd PRd x x sub p P4 p P1 R4 P4 P5 x x beqz p P5 x x lw p P2 R1 P1 P6 x x sw p P2 p P3 x subi p P3 R3 P3 P7 Store Queue Valid Comm itted? Inum Addr Data 1 No I4 8004 5 I1 (0x20C4) SUB R4, R4, R1 I2 (0x20C8) BEQZ R4, next I3 (0x20CC) LW R1, 0(R2) I4 (0x20D0) SW R3, 4(R2) I5 (0x20D4) SUBI R3, R3, 1 I6 (0x20D8) BEQZ R1, skip I7 (0x20DC) MULT R1, R3, R1 I8 (0x20E0) ADDI R2, R2, 4 I9 (0x20E4) BNE R4, R1, loop I10 (0x20E8) ADDI R4, R4, 1 Assume: I2 (BEQZ) was predicted NT and actually NT. next to commit next available Branch Global History Before After 10010110 Suppose the next instruction is fetched decoded issued 0x20D8 I6 (beqz) 0x20DC I6 (beqz) --------------------------- x beqz p P6 ---------------- 1 00101100 P6 P2 P7 P5 PC[4:2] ^ GH[2:0] = 110 ^ 110 = 0 à NT
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example(2) Prediction Counters Index Before After 000 00 001 01 010 01 011 11 100 00 101 11 110 10 111 01 Instruction in Fetch Stage PC Instr Instruction in Decode Stage Next PC to be fetched Before After 0x20DC Register Alias Table (latest) Name Before After R1 P6 R2 P2 R3 P7 R4 P5 RAT (Snapshot 1) Valid: 1 Name Before After R1 P1 R2 P2 R3 P3 R4 P5 RAT (Snapshot 2) Valid: 1 Name Before After R1 P6 R2 P2 R3 P7 R4 P5 Physical Registers P0 P1 2 p P2 8000 p P3 5 p P4 20 p P5 18 p P6 0 p P7 P8 P9 Free List P8 P9 P0 Reorder Buffer (ROB) use ex op p1 PR1 p2 PR2 Rd LPRd PRd x x sub p P4 p P1 R4 P4 P5 x x beqz p P5 x x lw p P2 R1 P1 P6 x x sw p P2 p P3 x subi p P3 R3 P3 P7 x beqz p P6 Store Queue Valid Comm itted? Inum Addr Data 1 No I4 8004 5 I1 (0x20C4) SUB R4, R4, R1 I2 (0x20C8) BEQZ R4, next I3 (0x20CC) LW R1, 0(R2) I4 (0x20D0) SW R3, 4(R2) I5 (0x20D4) SUBI R3, R3, 1 I6 (0x20D8) BEQZ R1, skip I7 (0x20DC) MULT R1, R3, R1 I8 (0x20E0) ADDI R2, R2, 4 I9 (0x20E4) BNE R4, R1, loop I10 (0x20E8) ADDI R4, R4, 1 Assume: I2 (BEQZ) was predicted NT and actually NT. I6 (BEQZ) predicted NT. next to commit next available Branch Global History Before After 00101100 Suppose I7 is fetched decoded issued x mult P7 p P6 R1 P6 P8 P8 ---- 0x20E0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example(3) Prediction Counters Index Before After 000 00 001 01 010 01 011 11 100 00 101 11 110 10 111 01 Instruction in Fetch Stage PC Instr 0x20E4 I7 (bne) Instruction in Decode Stage I7 (bne) Next PC to be fetched Before After 0x20E8 Register Alias Table (latest) Name Before After R1 P8 R2 P2 R3 P7 R4 P5 RAT (Snapshot 1) Valid: 1 Name Before After R1 P1 R2 P2 R3 P3 R4 P5 RAT (Snapshot 2) Valid: 1 Name Before After R1 P6 R2 P2 R3 P7 R4 P5 Physical Registers P0 P1 2 p P2 8000 p P3 5 p P4 20 p P5 18 p P6 0 p P7 P8 P9 Free List P9 P0 Reorder Buffer (ROB) use ex op p1 PR1 p2 PR2 Rd LPRd PRd x x sub p P4 p P1 R4 P4 P5 x x beqz p P5 x x lw p P2 R1 P1 P6 x x sw p P2 p P3 x subi p P3 R3 P3 P7 x beqz p P6 x mult P7 p P6 R1 P6 P8 Store Queue Valid Comm itted? Inum Addr Data 1 No I4 8004 5 I1 (0x20C4) SUB R4, R4, R1 I2 (0x20C8) BEQZ R4, next I3 (0x20CC) LW R1, 0(R2) I4 (0x20D0) SW R3, 4(R2) I5 (0x20D4) SUBI R3, R3, 1 I6 (0x20D8) BEQZ R1, skip I7 (0x20DC) MULT R1, R3, R1 I8 (0x20E0) ADDI R2, R2, 4 I9 (0x20E4) BNE R4, R1, loop I10 (0x20E8) ADDI R4, R4, 1 Assume: I2 (BEQZ) was predicted NT and actually NT. I6 (BEQZ) predicted NT. next to commit next available Branch Global History Before After 00101100 Now let’s examine two cases starting from this state every time
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example(4) Prediction Counters Index Before After 000 00 001 01 010 01 011 11 100 00 101 11 110 10 111 01 Register Alias Table (latest) Name Before After R1 P8 R2 P2 R3 P7 R4 P5 RAT (Snapshot 1) Valid: 1 Name Before After R1 P1 R2 P2 R3 P3 R4 P5 RAT (Snapshot 2) Valid: 1 Name Before After R1 P6 R2 P2 R3 P7 R4 P5 Physical Registers P0 P1 2 p P2 8000 p P3 5 p P4 20 p P5 18 p P6 0 p P7 P8 P9 Free List P9 P0 Reorder Buffer (ROB) use ex op p1 PR1 p2 PR2 Rd LPRd PRd x x sub p P4 p P1 R4 P4 P5 x x beqz p P5 x x lw p P2 R1 P1 P6 x x sw p P2 p P3 x subi p P3 R3 P3 P7 x beqz p P6 x mult P7 p P6 R1 P6 P8 Store Queue Valid Comm itted? Inum Addr Data 1 No I4 8004 5 I1 (0x20C4) SUB R4, R4, R1 I2 (0x20C8) BEQZ R4, next I3 (0x20CC) LW R1, 0(R2) I4 (0x20D0) SW R3, 4(R2) I5 (0x20D4) SUBI R3, R3, 1 I6 (0x20D8) BEQZ R1, skip I7 (0x20DC) MULT R1, R3, R1 I8 (0x20E0) ADDI R2, R2, 4 I9 (0x20E4) BNE R4, R1, loop I10 (0x20E8) ADDI R4, R4, 1 Assume: I2 (BEQZ) was predicted NT and actually NT. I6 (BEQZ) predicted NT. next to commit next available Branch Global History Before After 00101100 00 ---- Case I: Suppose as many instructions as possible commit ---- ---- ---- ---- ---- ---- P4 P1 Yes PC [4:2] = 010 GHR[2:0] at time of prediction = 011 (Greedy update by I2 and I6) BHT Index = 010 ^ 011 = 001 Instruction in Fetch Stage PC Instr 0x20E4 I7 (bne) Instruction in Decode Stage I7 (bne) Next PC to be fetched Before After 0x20E8
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Instruction in Fetch Stage PC Instr 0x20E4 I7 (bne) Instruction in Decode Stage I7 (bne) Next PC to be fetched Before After 0x20E8 Example(5) Prediction Counters Index Before After 000 00 001 01 010 01 011 11 100 00 101 11 110 10 111 01 Register Alias Table (latest) Name Before After R1 P8 R2 P2 R3 P7 R4 P5 RAT (Snapshot 1) Valid: 1 Name Before After R1 P1 R2 P2 R3 P3 R4 P5 RAT (Snapshot 2) Valid: 1 Name Before After R1 P6 R2 P2 R3 P7 R4 P5 Physical Registers P0 P1 2 p P2 8000 p P3 5 p P4 20 p P5 18 p P6 0 p P7 P8 P9 Free List P9 P0 Reorder Buffer (ROB) use ex op p1 PR1 p2 PR2 Rd LPRd PRd x x sub p P4 p P1 R4 P4 P5 x x beqz p P5 x x lw p P2 R1 P1 P6 x x sw p P2 p P3 x subi p P3 R3 P3 P7 x beqz p P6 x mult P7 p P6 R1 P6 P8 Store Queue Valid Comm itted? Inum Addr Data 1 No I4 8004 5 I1 (0x20C4) SUB R4, R4, R1 I2 (0x20C8) BEQZ R4, next I3 (0x20CC) LW R1, 0(R2) I4 (0x20D0) SW R3, 4(R2) I5 (0x20D4) SUBI R3, R3, 1 I6 (0x20D8) BEQZ R1, skip I7 (0x20DC) MULT R1, R3, R1 I8 (0x20E0) ADDI R2, R2, 4 I9 (0x20E4) BNE R4, R1, loop I10 (0x20E8) ADDI R4, R4, 1 Assume: I2 (BEQZ) was predicted NT and actually NT. I6 (BEQZ) predicted NT. next to commit next available Branch Global History Before After 00101100 ---- Case II: I1 results in an overflow exception. Exception Handler is at 0x1FF0 ---- ---- ---- ---- ---- ---- ---- P8 P7 ---- ---- ---- ---- P6 P5 P1 P3 P4 xx001011 0x1FF0 ---------------- ---------------------------
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: What is the coefficient of kinetic friction μk between the block and the tabletop
Q: Identify the following features of the mandible: 1) 2) 3) 4) 5) What does "postcranial" mean? What…
Q: Use the given reactions to answer question 4. Zn + Cu(NO3)2 → Zn(NO3)2 + Cu Zn + Pb(NO3)2 → Zn(NO3)2…
Q: Explain how people can hold dual attitudes and what the benefits and drawbacks of attitudes are
Q: In the circled atom in this structure. CH3 A) four sigma bonds B) three sigma bonds, one pi bond and…
Q: Between which two steps is the addition property of inequality used? Given: Step 1: Step 2: Step 3:…
Q: Joel conducted an investigation to see if different liquids changed the angle at which light was…
Q: Building. They want to maximize total revenue, but the number of jerseys available is a constraint.…
Q: There are two steps in the extraction of copper metal from chalcocite, a copper ore. In the first…
Q: Calculate SSE from the following data. Q 8 10 12 Q* 11 19 Enter as a value (round to two decimal…
Q: a) Accurately sketch a picture of the null and alternative densities on one set of axes. b) Find the…
Q: NDC 0186-0743-31 2 0186-0743-31 Keep closed. PRILOSec Protect from light and moisture Store between…
Q: Hilda wanted to investigate the absorption of heat by three different types of fabric. The set-up…
Q: The Alkaline Metal Cations that the more likely to be more soluble are the ones that are: [HIGHER or…
Q: Write the structural equation for the following reactions: A.  (SN1 equation)    tert-butyl…
Q: 1. Calculate the following numerical expressions, giving your answer as a single power of an…
Q: What is Unpacked BCD Format?
Q: . Seth hiked 3.5 miles each hour. Ordered pairs were graphed of the total distance Seth hiked. The…
Q: Predict the products of the following reaction. If no reaction will occur, use the NO REACTION…
Q: When someone experience a heightened state of self awareness this state is known as
Q: y=x+7 y = 2x
Q: Chromium-51 is a radioisotope that is used to assess the lifetime of red blood cells The half-life…