Q overestimation

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 13 3:24
Editor
Edited
Edited
2024 Jun 16 8:31

Overestimation in Q-learning on
OOD

Q-function is unreliable on out-of-distribution (OOD) actions since shift between and
Offline RL
에서 데이터가 제한되다 보니 주요 문제가 된다
notion image
  • Q-function is unreliable on out-of-distribution (OOD) actions.
  • will seek out actions where Q-function is over-optimistic. (Maximization Bias)
  • After values propagate, Q-value will become substantially overestimated. (벨만 업데이트 공식을 통해 다른 상태로 전파)
Regularization, pessimism or ensemble helps to address the overestimation issue.
Q overestimation methods
 
 
 
 
 
 

Recommendations