Single Instruction Multiple Data
Vector Processor
Need new Array processor and controller and Instruction Set Architecture
- SIMD - Single Instruction operates on multiple data Vector processors - one controller and multi processor - ex. array processor - reduce instruction memory and data memory access - best for loop and week for switch
Vectorizable Loops - do not access same index
Loop in which one iteration is not dependent on the other is vectorizable
- no control flow in loop = static instruction number is equal to dynamic instruction
Only vectorizable loops can be efficiently executed by vector processors
- Load / Store vector value - need Vector Register and memory support
- Vectors of different lengths in same operation - Vector length register(VLEN - control length of vector) or strip-mining(Break loops into pieces - overhead) needed
- Elements stored apart from each other - Vector Stride Register(VSTR) needed(length between continuative vector)
stride(increment, pitch or step size) of an array(vector)
compute/memory operation balance broken → bottleneck(usually memory)
memory banking? can be a one solution
dynamic instruction : run time number of instruction execution (do not count in-executed instruction) static instruction : line of assembly code
Scatter-Gather
If not strided manner(ex. index vector) → indirect access(Scatter - Gather)
- LVI/SVI instructions : load/store vector indices/gather ex. LVI Va, (Ra+Vk) ;load A[K[]]
strided manner : index 0 to length
Variables for Vector processor
Chaining
Allows a vector operation to start as soon as the individual elements of its vector source operands become available (RAW in same convoy via chaining can be occur - execute finish = available, available ≠ finish WB)
Real Application
intel use word as 16 bits
- MMX - bad desicion : aliasing MMX to FPU for c ompatibility
- MMX data types - all 64 bits
1. packed byte : 8 byte packed into 64 bits
2. packed word : 4 word packed into 64 bits
3. packed double-word : 2 double-words packed into 64 bits
4. packed quad-word : One 64-bit quantity
- SSE(1~3) - no aliasing
- Adds eight 128-bit registers
- Allows SIMD operations on packed single-precision floating-point numbers