Actor Critic

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 10 7:54
Editor
Edited
Edited
2024 Nov 18 22:7

Policy,
State-value function
동시에 학습

Policy (Actor) acts and Critic (Value function) evaluate so Policy is improved based on criteria. (how advantage is an action compared to the policy?)
  1. Sample trajectory point
  1. Update or by like
    GAE

Properties

  • no need to collect full trajectories for update by sampling efficiently
  • Actor-critic is sample efficient than REINFORCE

Value-based actor critic

  • PPO같이 semi on-policy같은 거 말고 value iteration하면서 off-policy데이터 사용하는 것
  • GAE 사용 안하고 q model 도 학습하면 ddpg나 sac같이 Value

How to update Actor

  • train parameters like standard deviation of shared normal or categorical distribution
Actor Critic Algorithms
 
 
 
 
 
 
 
 
 
 
 
 
 

 

Recommendations