YaRN

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 16 8:45
Editor
Edited
Edited
2025 Dec 17 14:45
Refs
Refs

Yet another RoPE extensioN

The reparametrization of RoPE as a set of 2D matrices has a clear benefit on the implementation of this attention scaling: we can instead use a “length scaling” trick which scales both qm and kn by a constant factor p 1/t by simply scaling the complex RoPE embeddings by the same amount.

Results

  • YaRN isn’t just good at making sense of longer sentences during fine-tuning, it can also understand things beyond what it learned from the limited context data during fine-tuning.
  • Dynamic-YaRN, combined with Dynamic Scaling at inference time, allows for more than 2x context window extension without any fine-tuning.
  • YaRN allows efficient extrapolation with finetuning on shorter datasets and can take advantage of transfer learning for faster convergence.
 
 
 
YaRN: Efficient Context Window Extension of Large Language Models
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence...
YaRN: Efficient Context Window Extension of Large Language Models
Paper page - YaRN: Efficient Context Window Extension of Large Language Models
Join the discussion on this paper page
Paper page - YaRN: Efficient Context Window Extension of Large Language Models
NousResearch/Yarn-Llama-2-13b-128k · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
NousResearch/Yarn-Llama-2-13b-128k · Hugging Face
Understanding YaRN: Extending Context Window of LLMs
YaRN: Yet another RoPE extensioN method
Understanding YaRN: Extending Context Window of LLMs
 
 

Recommendations