WGMMA (warp group matrix multiply accumulate)
PTX ISA 8.4
The programming guide to using PTX (Parallel Thread Execution) and ISA (Instruction Set Architecture).
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
GPUs Go Brrr
how make gpu fast?
https://hazyresearch.stanford.edu/blog/2024-05-12-tk

Supply and Demand
Nvidia H100 GPUs: Supply and Demand
This post is an exploration of the supply and demand of GPUs, particularly Nvidia H100s.
https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/


Seonglae Cho