WGMMA (warp group matrix multiply accumulate)
PTX ISA 8.4
The programming guide to using PTX (Parallel Thread Execution) and ISA (Instruction Set Architecture).
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
GPUs Go Brrr
how make gpu fast?
https://hazyresearch.stanford.edu/blog/2024-05-12-tk


Seonglae Cho