Skill0

By embedding skills into the model parameters, the agent can perform zero-shot autonomous actions at inference time without any skill retrieval.

First, Relevance-Driven Skill Grouping maps the hierarchical SkillBank’s markdown skill files to validation sub-tasks. Second, In-Context Reinforcement Learning (ICRL) renders the skills and interaction history as a compact RGB image, encodes it with a vision encoder, and has the agent jointly generate an action and a compression ratio : . The visual context is represented as , and training uses a composite reward that encourages compression, , where is applied only upon success. The training objective is PPO-based: *. Third, Dynamic Curriculum Learning splits training into $N_S$ stages with a linearly decreasing skill budget , and retains only the skills that make a positive on-policy contribution as measured by helpfulness.

arxiv.org

https://arxiv.org/pdf/2604.02268

Skill0

Recommendations