Actor Critic

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 10 7:54
Editor
Edited
Edited
2025 Jul 6 0:36

Simultaneous learning of Policy and
State-value function

Policy (Actor) acts and Critic (Value function) evaluate so Policy is improved based on criteria. (how advantage is an action compared to the policy?)
  1. Sample trajectory point
  1. Update or by like
    GAE

Properties

  • no need to collect full trajectories for update by sampling efficiently
  • Actor-critic is sample efficient than REINFORCE

Value-based actor critic

  • Unlike semi on-policy methods like PPO, this approach uses off-policy data while performing value iteration
  • Instead of using GAE, it also learns a Q model, similar to algorithms like DDPG or SAC

How to update Actor

  • train parameters like standard deviation of shared normal or categorical distribution
Actor Critic Algorithms
 
 
 
 
 

 

Recommendations