A guide to LLM inference and performance
To attain the full power of a GPU during LLM inference, you have to know if the inference is compute bound or memory bound. Learn how to better utilize GPU resources.
https://www.baseten.co/blog/llm-transformer-inference-guide