How to use skills
Typically try to solve long-horizon task. Very hard to get it work
High-level policy makes high level decision and low-level policy makes low-level decision. HRL helps solving long-horizon complex tasks with temporally extended exploration and simplified credit assignment. Many different HRL approaches suggested, yet there is no go-to method.
- It is hard to assign credit for how much has been contributed from high-level or low-level policy.
- Policy update in low level requires to update all high level policies (complex learning dynamics)
- Increasing a count of hierarchical levels could be helpful for better results?
High level
Skill dynamics model better than single skill RL
- learning transition between skills since skills are trained independently
- Skill chaining problem appears since the good initial states for each skills could be a bad initial state of ending state and vice versa. We need to bring an agent to bring a good ending state for next state with transition policy.
- Task policy → Skill embedding → skill policy
- Many skills makes RL harder
Implementation
Is is kind of system so there could be a many design choices
- End-to-end vs. Pre-trained skills
- Low level policy is a goal-reaching policy
- High-level transition () becomes incorrect as low-level policy changes
- Dependency is a problem for HRL
- High-level policy
Skill chaining (initiation set I and Termination set )
To chain more skills, we need to increase initiation set, while keeping termination set small.
- T-STAR
- Transition Policy
Skill dynamics model (learn skill prior)
skill dynamics model improve sample efficiency for long-horizon tasks
- SPiRL
- SkiMo
Hierarchical RLs