Instruction Parallelism

Instruction Parallelism

Creator
Created
Created
2019 Nov 5 5:17
Editor
Edited
Edited
2023 Feb 28 15:55
Refs
Refs

Kind of instruction level parallelism in independent instructions

  • Idea : Multi pipeline(multi issue) and discrete register file of integer and floating point
  • The order of execution is usually assisted by the compiler (need and dependent upon HW)
notion image
 
notion image
  • each has own pipeline
 
  • Scalar : MAX IPC is 1 in one pipeline parallel
  • Super-Scalar : so we need multi pipeline
  • Super-pipeline : pipeline depth deeper by increase stage numbers - hyper pipeline almost stages are scheduling and queuing and control check, common stages number is same
 
do not confuse same stage can do in same time with different instruction cannot do different stage of same instruction in same time
 

Independence

CISC usually need decode so not good for Super-Scalar operation. since stall double-impact SS

Solutions

  • RAW : stages before read stage of following instruction can be in parallel with first
  • WAs : by register renaming usually subscript - tool for register renaming : hardware register with subscript / logical register without subscript
  • Procedural dependency : Cannot parallel, until branch or jump execute → use branch predictor. and if fail then flush(retire) or commitment
  • Resource conflict : increase resources(functional unit), else stall only execution stage
Delayed branch - not use usually but theory is do branch early (O-o-O) when there is no dependency to previous instruction Window : which compiler can see at once increase resource without renaming show low performance increasement
 

 
notion image
fetch (+branch prediction) → dispatch → issue → execution → reorder and commit
fetch : insert in pipeline

Put results into correct order

  1. commit / retire(in order) - success
  1. flushed - fail
Commitment - by temporary storage (have HW overhead of clean-up) to permanent storage
dispatch(OoO) - if resources are avilable fetching/reading an instruction from memory
issue(OoO) - allocation step
notion image

  • (degree of instruction level parallelism of instructions) = (the number of instructions which can be handle in same time)
  • in order in SS permit instructions in same stage
Instruction Translation Lookaside Buffer - in CISC, HW to make operation to micro operation to micro-operation queue micro -ops coded in ROM - fetched from ROM Trace Cache Branch contains dynamic gathered history information tag + micro operations too trace is sequence of micro-ops (allocator rename by reorder buffer - ROB and reorder and then send to scheduler queue(FIFO) and then retire from ROB) Drive : Wire delay stage Flag : one or more data bits used to store binary values as specific program structure indicators - usually used for indicate possibility of post-process dispatch stage decoupled from the issue step and the graduation stage to be decoupled from the execute stage
 

 

Others

n-way(issue / wide) superscalar machine - n instructions can be in same stage
parallel branch prediction is more complex
 
main reason of multi process performance limit (not n multiply) is memory but bottleneck - need appropriate memory mapping
 

Recommendations