Skill0

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Apr 15 16:11
Editor
Edited
Edited
2026 Apr 15 16:13
Refs
Refs
 
 
 
 
 
By embedding skills into the model parameters, the agent can perform zero-shot autonomous actions at inference time without any skill retrieval.
First, Relevance-Driven Skill Grouping maps the hierarchical SkillBank’s markdown skill files to validation sub-tasks. Second, In-Context Reinforcement Learning (ICRL) renders the skills and interaction history as a compact RGB image, encodes it with a vision encoder, and has the agent jointly generate an action and a compression ratio : . The visual context is represented as , and training uses a composite reward that encourages compression, , where is applied only upon success. The training objective is PPO-based: *. Third, Dynamic Curriculum Learning splits training into $N_S$ stages with a linearly decreasing skill budget , and retains only the skills that make a positive on-policy contribution as measured by helpfulness.
arxiv.org
 
 

Recommendations