not real renaming itself literal - just like we are using like many register
- 2 register specifier per instruction (3 in 6600 scoreboarding)
- 4 FP registers (8 in 6600 scoreboarding)
Suppose newly named register. Then that need value, so need of pointer is trivial infer
defaultly assign new name to all register and if find same and no hazard then use same name
we treat load and store as FU with RSs as well
control & buffers distributed with FU (FU buffer is called reservation stations)
renaming to value or pointer to RS(reservation stations)
Common Data Bus broadcast results to all FUs then pointer out
go everywhere except memory (load buffer and instruction Queue (spit instruction stream))
Load Store Queue(LSQ), immediate instruction is not in FP queue
there is more RS than registers so can do optimizations that compiler can't CDB send 64 bit of data and 4 bit of FU source address too since LB we can have multiple loads outstanding
Register result status
- indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register
- usually has clock cycle counter and reservation station has FU count down
Reservation Stations
no destination register and R(R usage boolean)
- Busy : indicate whether FU is` using
- Op : which instruction is operating
- Vj: value(ex. R(Fn)) (register F at scoreboard)
- Vk: value(ex. R(Fn)) (register F at scoreboard)
- Qj : reservation stations producing source registers - value to written (FU at scoreboard)
- Qk : reservation stations producing source registers - value to written (FU at scoreboard)
if source register is two then Q + J number is always 2 in one R.S
V is usable by this R.S and not usable then Q
Load / Store buffer
- A: Address information for a load/store. Initially holding the imm field; later, holding the effective addr.
- busy - default No
M(An) is nth Load instruction value from memory, R(Fn) is Register value in Fn) and result value by them is like M+M-M and M*F4
Store buffer need FU to get broadcasting in instruction status, there is Iter attribute to opertaion
Tomasulo Algorithm's Three Stages
every stage is one clock except FU execution
1. Issue - get instruction from Floating Point Operation Queue
If reservation station free (no structural hazard - raw), see Register result status and fill V, Q and RRS (renames registers)
- if not load then fill resetvation stations
- Write FU in Register result status and if load then make LB busy - LB act like FU
- at the cycle V WB start execution count
Why these stages stall? Issue - structural hazard execution (start) - WAR hazard Write result - no stall
Always Write result is next cycle of execution complete
Hazards
- RAW - treat same as scoreboard : execute instruction only when its operands are available
- WAR & WAR - avoid by register renaming : renaming all destination registers, including those with a pending read or write from an easier instruction
waw - renaming upper register?
war - renaming downer register?
raw is \(same register position) cross instruction code war is /(same register position) cross instruction code waw is |(same register position) vertical at instruction code
Drawbacks
- Many associative stores (CDB) at high speed so Performance limited by Common Data Bus also CDB must go to multiple FU so hard hardware
- do not use Multiple CDBs → more FU logic for parallel associative stores
Summary
Register file completely detached from computation so maybe update after basic block?
- register renaming Multiple iterations use different physical destinations for registers dynamic loop unrolling - treat loop as not loop just treat as repeated instructions tomasulo's scheme can overlap iterations of loop
- Reservation stations hazard detectio logic distributed reservatio stations and CDB
- Other perspective: Tomasulo building data flow dependency graph on the fly?
scoreboard + start operation when know real value time. since we point out them, change of register status does not matter