Low-Rank Adaptation
LoRA is a technique that optimizes rank decomposition matrices with LoRA module rank .
The existing PEFT technology reduces the available sequence length of the model or expands the model depth for inference. LoRA usually applied to MLP by two low-rank matrix before activation.
Free weight and append adaptation layer composed by adaptation matrix with proving low-rank adaptation matrix is sufficient for fine-tuning. By dividing adaptation matrix likeBottleneck layer, It makes total model size smaller and make it achieve same learning rate with less computing resources.
A learning rate of 1e-4 has become the standard when fine-tuning LLMs with LoRA. Although we occasionally encountered training loss instabilities, reducing the learning rate to lower values like 3e-5. LoRA’s weight is initialized randomly but techniques likeEVA uses activation vector decomposing it to initialize based on priority.
LoRA Usages
LoRA architectures generate distinct high-magnitude eigenvalues, known as intrusive components, that don't exist in conventional fine-tuning. These components influence how the model generalizes.
Model Regularization
LoRA Learns Less and Forgets Less