Write and Use Custom CUDA Extensions for Critical Operations llm.ckarpathy • Updated 2026 Feb 23 14:46
llm.c
karpathy • Updated 2026 Feb 23 14:46
You can start by using profiling tools to identify specific operations in your model that are potential bottlenecks and could benefit from a custom CUDA implementation.
cpp_extension module to create a bridge between your CUDA kernels and your PyTorch code.Once compiled and loaded, the custom operation can be used directly in your PyTorch models like any other function.

Seonglae Cho