Model CompressionTensor DecompositionKnowledge Distillation Parameter pruning Model Quantization Model Optimization TechniquesModel OptimizerModel QuantizationInference OptimizationKnowledge DistillationParameter pruningAI Compiler OptimizationActivation Checkpointing How to make LLMs go fastBlog about linguistics, programming, and my projectshttps://vgel.me/posts/faster-inference/