Hierarchical RL

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 29 1:13
Editor
Edited
Edited
2024 Jun 21 13:10

How to use skills

Typically try to solve long-horizon task. Very hard to get it work
High-level policy makes high level decision and low-level policy makes low-level decision. HRL helps solving long-horizon complex tasks with temporally extended exploration and simplified credit assignment. Many different HRL approaches suggested, yet there is no go-to method.
  • It is hard to assign credit for how much has been contributed from high-level or low-level policy.
  • Policy update in low level requires to update all high level policies (complex learning dynamics)
  • Increasing a count of hierarchical levels could be helpful for better results?

High level

Skill dynamics model better than single skill RL
  • learning transition between skills since skills are trained independently
  • Skill chaining problem appears since the good initial states for each skills could be a bad initial state of ending state and vice versa. We need to bring an agent to bring a good ending state for next state with transition policy.
  • Task policy → Skill embedding → skill policy
  • Many skills makes RL harder

Implementation

Is is kind of system so there could be a many design choices
  • End-to-end vs. Pre-trained skills
  • Low level policy is a goal-reaching policy
  • High-level transition () becomes incorrect as low-level policy changes
  • Dependency is a problem for HRL
  • High-level policy

Skill chaining (initiation set I and Termination set )

To chain more skills, we need to increase initiation set, while keeping termination set small.
  • T-STAR
  • Transition Policy

Skill dynamics model (learn skill prior)

skill dynamics model improve sample efficiency for long-horizon tasks
  • SPiRL
  • SkiMo
Hierarchical RLs
 
 
 
 
 

Recommendations