SpQR: A Sparse-Quantized Representation for Near-Lossless LLM...
Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit...
https://arxiv.org/abs/2306.03078