Replay Buffer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 13 3:24
Editor
Edited
Edited
2024 Jun 5 2:34
Refs
Refs

Replay Buffer
for sampled state management

for correlated samples in online Q-learning
연속적인 샘플들은 서로 강한 상관관계를 가지게 되는데, 이는 학습 과정에서의 분산을 증가시키고, 최적화 과정을 불안정하게 할 수 있어서, 전체 buffer 저장한 후 무작위로 샘플을 추출하여 사용한다
notion image
to prevent
  • Correlated samples / consecutive samples
 
 

Prioritized experience replay (PER)

Transitions are uniformly sampled → new experience is not frequently sampled as replay buffer gets larger (prioritize sampling using more high TD-error transitions)
Transitions are uniformly sampled because experience replay using replay buffer makes training data
iid
. Prioritized sampling more with high TD-error
  • Loss shrinks very quickly
  • Only sample a few transitions with high error; thus, prone to overfit
  • Evaluating TD-error is very expensive overhead so only updated when loss is computed not for every Q update
 
 
 
 
 
 
 
 

State Representation Sampling

Using learned state representations to sample more effectively from the buffer, potentially prioritizing certain types of experiences that are more beneficial for learning.

Recommendations