Nested Learning

The existing "depth" concept in deep learning fails to explain the actual learning structure, and proposes a new learning paradigm Nested Learning (NL) that interprets the entire model as a nested structure of multi-level optimization problems

All

Neural Network and

Model Optimizer can be viewed as

Associative Memory that compresses context flow.

Gradient Descent → Level 1
Associative Memory (data→error signal mapping)

Momentum Method/
Adam Optimizer → Level 2 nested optimization (compressing and memorizing past gradients)

Attention Mechanism,
Multi Layer Perceptron → Sub-optimization modules with their own unique context flows

Deep learning is not simply about stacking layers, but should be understood as a multi-level optimization system with multiple time scales and periodic updates. However, this argument is weak because transformers already operate with virtual layers that interact much more naturally with other layers, making them already a form of non-restricted nested learning.

Nested Learning Models

HOPE

Introducing Nested Learning: A new ML paradigm for continual learning

Ali Behrouz, Student Researcher, and Vahab Mirrokni, VP and Google Fellow, Google Research

https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/

Nested Learning

Recommendations