we use TLP though there is switching overhead - and should specified in code
Costs : Each thread requires PC, GPRs System states • virtual memory page table base register • exception handling registers Overhead • Additional cache/TLB conflicts from competing threads
- Difficult to continue to extract instruction-level parallelism (ILP) or data-level parallelism (DLP) from a single sequential thread of control
- since of memory latency bottleneck, lots of hazards → unusing hw resource occur frequently