LoRA

Low-Rank Adaptation

LoRA is a technique that optimizes rank decomposition matrices with LoRA module rank

r

W_P' = W_P A_{m\times n}B_{n\times m}

The existing PEFT technology reduces the available sequence length of the model or expands the model depth for inference. LoRA usually applied to MLP by two low-rank matrix before activation.

Free weight and append adaptation layer composed by adaptation matrix with proving low-rank adaptation matrix is sufficient for fine-tuning. By dividing adaptation matrix like

Bottleneck layer, It makes total model size smaller and make it achieve same learning rate with less computing resources.

A learning rate of 1e-4 has become the standard when fine-tuning LLMs with LoRA. Although we occasionally encountered training loss instabilities, reducing the learning rate to lower values like 3e-5. LoRA’s weight is initialized randomly but techniques like

EVA uses activation vector decomposing it to initialize based on priority.

LoRA Usages

LoRA architectures generate distinct high-magnitude eigenvalues, known as intrusive components, that don't exist in conventional fine-tuning. These components influence how the model generalizes.

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of...