AI Optimization

Creator

Creator

Seonglae Cho

Created

Created

2023 Jun 4 10:16

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jul 1 21:3

Refs

Refs

AI Compiler Optimization

Activation Engineering

Prompt Engineering

Model Compression

Tensor Decomposition

Knowledge Distillation

Parameter pruning

Model Quantization

Model Optimization Techniques

Model Optimizer

Model Quantization

Inference Optimization

Knowledge Distillation

Parameter pruning

AI Compiler Optimization

Activation Checkpointing

Perturbative Learning

Just implemented a full pipeline from library submission to leaderboard report update! I'll set up the whole GitHub cron action and request some secret settings on the repository tomorrow.

GPU Memory with quantization

Calculating GPU memory for serving LLMs | Substratus.AI

How many GPUs do I need to be able to serve Llama 70B? In order to answer that, you need to know how much GPU memory will be required by the Large Language Model.

https://www.substratus.ai/blog/calculating-gpu-memory-for-llm

Calculating GPU memory for serving LLMs | Substratus.AI

How to make LLMs go fast

Blog about linguistics, programming, and my projects

https://vgel.me/posts/faster-inference/

Calculating memory

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?

In nearly all LLM interviews, there’s one question that consistently comes up: “How much GPU memory is needed to serve a Large Language…

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?

https://masteringllm.medium.com/how-much-gpu-memory-is-needed-to-serve-a-large-languagemodel-llm-b1899bb2ab5d

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?

Backlinks

AI Inference Tool AI Energy Attention Mechanism Optimization LLM LLM

Recommendations

/////