Parameter Efficient Quantization-aware AdaptationLoRA보다 훨씬 적은 양의 메모리를 점유하는 Fine-tuning이 가능결과는 3/4-bit Weight-only Uniform Quantization된 형태 Memory-Efficient Fine-Tuning of Compressed Large Language Models...Parameter-efficient fine-tuning (PEFT) methods have emerged to mitigate the prohibitive cost of full fine-tuning large language models (LLMs). Nonetheless, the enormous size of LLMs impedes...https://arxiv.org/abs/2305.14152