Model CompressionTensor DecompositionKnowledge Distillation Parameter pruning Model Quantization Model Optimization NotionModel OptimizerModel QuantizationInference OptimizationKnowledge DistillationParameter pruningAI Compiler OptimizationActivation Checkpointing How to make LLMs go fastBlog about linguistics, programming, and my projectshttps://vgel.me/posts/faster-inference/