TRPO

Creator
Creator
Seonglae Cho
Created
Created
2023 Jul 15 17:9
Editor
Edited
Edited
2024 Apr 30 11:32

Trust Region Policy Optimization

DKL(πθ,πθold)ϵD_{KL} (\pi_\theta, \pi_{\theta_{old}}) \le \epsilon
KL divergence로 implementation이 어렵고 느려서 ppo가 선호된다
 
 
 
 
 
 
 

 

Recommendations