Kind of instruction level parallelism in independent instructions
- Idea : Multi pipeline(multi issue) and discrete register file of integer and floating point
- The order of execution is usually assisted by the compiler (need and dependent upon HW)
- each has own pipeline
- Scalar : MAX IPC is 1 in one pipeline parallel
- Super-Scalar : so we need multi pipeline
- Super-pipeline : pipeline depth deeper by increase stage numbers - hyper pipeline almost stages are scheduling and queuing and control check, common stages number is same
do not confuse same stage can do in same time with different instruction cannot do different stage of same instruction in same time
Independence
CISC usually need decode so not good for Super-Scalar operation. since stall double-impact SS
Solutions
- RAW : stages before read stage of following instruction can be in parallel with first
- WAs : by register renaming usually subscript - tool for register renaming : hardware register with subscript / logical register without subscript
- Procedural dependency : Cannot parallel, until branch or jump execute → use branch predictor. and if fail then flush(retire) or commitment
- Resource conflict : increase resources(functional unit), else stall only execution stage
Delayed branch - not use usually but theory is do branch early (O-o-O) when there is no dependency to previous instruction Window : which compiler can see at once increase resource without renaming show low performance increasement
fetch (+branch prediction) → dispatch → issue → execution → reorder and commit
fetch : insert in pipeline
- (degree of instruction level parallelism of instructions) = (the number of instructions which can be handle in same time)
- in order in SS permit instructions in same stage
Instruction Translation Lookaside Buffer - in CISC, HW to make operation to micro operation to micro-operation queue micro -ops coded in ROM - fetched from ROM Trace Cache Branch contains dynamic gathered history information tag + micro operations too trace is sequence of micro-ops (allocator rename by reorder buffer - ROB and reorder and then send to scheduler queue(FIFO) and then retire from ROB) Drive : Wire delay stage Flag : one or more data bits used to store binary values as specific program structure indicators - usually used for indicate possibility of post-process dispatch stage decoupled from the issue step and the graduation stage to be decoupled from the execute stage
Others
n-way(issue / wide) superscalar machine - n instructions can be in same stage
parallel branch prediction is more complex
main reason of multi process performance limit (not n multiply) is memory but bottleneck - need appropriate memory mapping