Replay Buffer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 13 3:24
Editor
Edited
Edited
2024 Nov 21 11:37
Refs
Refs

Replay Buffer
for sampled state management

for correlated samples in online Q-learning
Consecutive samples have strong correlations with each other, which can increase variance in the learning process and destabilize optimization. To address this, the entire buffer is stored and samples are randomly extracted for use.
notion image
to prevent
  • Correlated samples / consecutive samples
 
 

Prioritized experience replay (PER)

Transitions are uniformly sampled → new experience is not frequently sampled as replay buffer gets larger (prioritize sampling using more high TD-error transitions)
Transitions are uniformly sampled because experience replay using replay buffer makes training data
iid
. Prioritized sampling more with high TD-error
  • Loss shrinks very quickly
  • Only sample a few transitions with high error; thus, prone to overfit
  • Evaluating TD-error is very expensive overhead so only updated when loss is computed not for every Q update
 

State Representation Sampling

Using learned state representations to sample more effectively from the buffer, potentially prioritizing certain types of experiences that are more beneficial for learning.
 
 
 
 
 
 
 

Recommendations