The learning rate and Hebbian theory learning coefficient are optimized using an Evolutionary algorithm, updating at each timestep according to Hebb's rule (ABCD model). This approach converges quickly to a set of weights.
Specifically, rather than the agent directly receiving rewards or error signals to modify w, it uses meta (evolutionary) learning to first discover "which Hebbian rules (h) produce good rewards", and then during execution, it continuously updates w following only those rules.
Unlike Fast Weight which brings temporary changes, this method allows agents to learn continuously and is the most brain-like approach to implementing Neuroplasticity. However, it hasn't been applied to LLMs or attention mechanisms, and while ideas like synchronization are promising, there are concerns about EA's effectiveness and engineering challenges regarding scaling. Alternative approaches like Continuous Thought Machine have since emerged.