Model-Internal RAG implementing Working memory similar
1 mio token context window tested, but there is no upper limit
- Utilizes standard local attention mechanisms found in transformers.
- Integrates a global attention mechanism through a compression technique.
- Merges both local and global attention to manage extended contexts efficiently.