learned Fourier features
Improve sample efficiency and stability in a variety of deep RL algorithms
Periodic representations consistently converge to high frequencies regardless of their initialization frequency. Weight Decay regularization is able to partially offset the overfitting of periodic activation functions, delivering value functions that learn quickly while also generalizing.
The reason why periodic activation functions are not commonly used in LLMs is that tasks such as text generation require contextual understanding and logical connections more than periodic patterns