Replay Buffer

Replay Buffer for sampled state management

for correlated samples in online Q-learning

연속적인 샘플들은 서로 강한 상관관계를 가지게 되는데, 이는 학습 과정에서의 분산을 증가시키고, 최적화 과정을 불안정하게 할 수 있어서, 전체 buffer 저장한 후 무작위로 샘플을 추출하여 사용한다

to prevent

Correlated samples / consecutive samples

Prioritized experience replay (PER)

Transitions are uniformly sampled → new experience is not frequently sampled as replay buffer gets larger (prioritize sampling using more high TD-error transitions)

Transitions are uniformly sampled because experience replay using replay buffer makes training data

iid. Prioritized sampling more with high TD-error

||Q(s_i,a_i)- y_i||^2

Loss shrinks very quickly

Only sample a few transitions with high error; thus, prone to overfit

Evaluating TD-error is very expensive overhead so only updated when loss is computed not for every Q update

State Representation Sampling

Using learned state representations to sample more effectively from the buffer, potentially prioritizing certain types of experiences that are more beneficial for learning.

Replay Buffer

Replay Buffer for sampled state management

Prioritized experience replay (PER)

State Representation Sampling

Backlinks

Recommendations