Papers with Code - Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Implemented in one code library.
https://paperswithcode.com/paper/flash-llm-enabling-cost-effective-and-highly
Seonglae Cho
Seonglae Cho