Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/
AI Optimization
Search

AI Optimization

Creator
Creator
Seonglae Cho
Created
Created
2023 Jun 4 10:16
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Oct 30 20:48
Refs
Refs
AI Compiler Optimization
Activation Engineering
Prompt Engineering

Model Compression

  • Tensor Decomposition
  • Knowledge Distillation
  • Parameter pruning
  • Model Quantization
Model Optimization Techniques
Model Optimizer
Model Quantization
Inference Optimization
Knowledge Distillation
Parameter pruning
AI Compiler Optimization
Activation Checkpointing
Perturbative Learning
 
 
 

GPU Memory
with quantization

Calculating GPU memory for serving LLMs | Substratus.AI
How many GPUs do I need to be able to serve Llama 70B? In order to answer that, you need to know how much GPU memory will be required by the Large Language Model.
Calculating GPU memory for serving LLMs | Substratus.AI
https://www.substratus.ai/blog/calculating-gpu-memory-for-llm
Calculating GPU memory for serving LLMs | Substratus.AI
How to make LLMs go fast
Blog about linguistics, programming, and my projects
How to make LLMs go fast
https://vgel.me/posts/faster-inference/

Calculating memory

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?
In nearly all LLM interviews, there’s one question that consistently comes up: “How much GPU memory is needed to serve a Large Language…
How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?
https://masteringllm.medium.com/how-much-gpu-memory-is-needed-to-serve-a-large-languagemodel-llm-b1899bb2ab5d
How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/
AI Optimization
Copyright Seonglae Cho