Model Compression
- Tensor Decomposition
Model Optimization Techniques
Just implemented a full pipeline from library submission to leaderboard report update! I'll set up the whole GitHub cron action and request some secret settings on the repository tomorrow.
GPU Memory with quantization
How to make LLMs go fast
Blog about linguistics, programming, and my projects
https://vgel.me/posts/faster-inference/
Calculating memory
How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?
In nearly all LLM interviews, there’s one question that consistently comes up: “How much GPU memory is needed to serve a Large Language…
https://masteringllm.medium.com/how-much-gpu-memory-is-needed-to-serve-a-large-languagemodel-llm-b1899bb2ab5d


Seonglae Cho