flash-llmAlibabaResearch • Updated 2024 Jul 12 13:56 Papers with Code - Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured SparsityImplemented in one code library.https://paperswithcode.com/paper/flash-llm-enabling-cost-effective-and-highly