Tomasulo Algorithm

Tomasulo Algorithm

Creator
Created
Created
2019 Nov 5 5:17
Editor
Edited
Edited
2023 Feb 28 15:55
Refs
Refs
not real renaming itself literal - just like we are using like many register
 
  • 2 register specifier per instruction (3 in 6600 scoreboarding)
  • 4 FP registers (8 in 6600 scoreboarding)
 
Suppose newly named register. Then that need value, so need of pointer is trivial infer
defaultly assign new name to all register and if find same and no hazard then use same name
notion image
we treat load and store as FU with RSs as well

control & buffers distributed with FU (FU buffer is called reservation stations)

renaming to value or pointer to RS(reservation stations)

Common Data Bus broadcast results to all FUs then pointer out

go everywhere except memory (load buffer and instruction Queue (spit instruction stream))

Load Store Queue(LSQ), immediate instruction is not in FP queue

 
there is more RS than registers so can do optimizations that compiler can't CDB send 64 bit of data and 4 bit of FU source address too since LB we can have multiple loads outstanding
 

Register result status

  • indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register
  • usually has clock cycle counter and reservation station has FU count down

Reservation Stations

no destination register and R(R usage boolean)
  1. Busy : indicate whether FU is` using
  1. Op : which instruction is operating
  1. Vj: value(ex. R(Fn)) (register F at scoreboard)
  1. Vk: value(ex. R(Fn)) (register F at scoreboard)
  1. Qj : reservation stations producing source registers - value to written (FU at scoreboard)
  1. Qk : reservation stations producing source registers - value to written (FU at scoreboard)
if source register is two then Q + J number is always 2 in one R.S
 
V is usable by this R.S and not usable then Q

Load / Store buffer

  1. A: Address information for a load/store. Initially holding the imm field; later, holding the effective addr.
  1. busy - default No
M(An) is nth Load instruction value from memory, R(Fn) is Register value in Fn) and result value by them is like M+M-M and M*F4
 
Store buffer need FU to get broadcasting in instruction status, there is Iter attribute to opertaion
 

Tomasulo Algorithm's Three Stages

every stage is one clock except FU execution

1. Issue - get instruction from Floating Point Operation Queue

If reservation station free (no structural hazard - raw), see Register result status and fill V, Q and RRS (renames registers)
  • if not load then fill resetvation stations
  • Write FU in Register result status and if load then make LB busy - LB act like FU
  • at the cycle V WB start execution count
 

3. execution Complete

  • especially no change
  • usually immediate instruction need one clock

4. Write Result - finish execution

Write on Common Data Bus to all awaiting units; mark reservation station available
→ write if matches expected FU source → broadcast (Q to V of that R.S Q)
Why these stages stall? Issue - structural hazard execution (start) - WAR hazard Write result - no stall
Always Write result is next cycle of execution complete
 

Hazards

  • RAW - treat same as scoreboard : execute instruction only when its operands are available
  • WAR & WAR - avoid by register renaming : renaming all destination registers, including those with a pending read or write from an easier instruction
waw - renaming upper register?
war - renaming downer register?
 
raw is \(same register position) cross instruction code war is /(same register position) cross instruction code waw is |(same register position) vertical at instruction code
 

Drawbacks

  • Many associative stores (CDB) at high speed so Performance limited by Common Data Bus also CDB must go to multiple FU so hard hardware
  • do not use Multiple CDBs → more FU logic for parallel associative stores

Summary

Register file completely detached from computation so maybe update after basic block?
  • register renaming Multiple iterations use different physical destinations for registers dynamic loop unrolling - treat loop as not loop just treat as repeated instructions tomasulo's scheme can overlap iterations of loop
  • Reservation stations hazard detectio logic distributed reservatio stations and CDB
  • Other perspective: Tomasulo building data flow dependency graph on the fly?
scoreboard + start operation when know real value time. since we point out them, change of register status does not matter
 

Reference

 

Recommendations