Trainable Memory Graph

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 16 12:24
Editor
Edited
Edited
2026 Feb 16 12:29
Refs
Refs
  • implicit memory (learning/RL): generalizes well but is a black box, suffers from catastrophic forgetting, and is hard to interpret
  • explicit memory (prompts/memory): transparent but static, weak at adaptation/generalization
This paper bridges the gap between the two. Instead of just storing experiences, it standardizes trajectories using FSM, extracts meta-strategies (meta-cognition) from them, and creates a trainable graph memory that learns the "usefulness" of these strategies via RL signals.
  1. Graph Memory Construction (hierarchical):Query nodes → FSM canonical path nodes → Meta-cognition nodes (human-readable strategy sentences).
  1. Weight Learning (REINFORCE):Based on the difference ΔR between reward with meta-strategy R_with vs without R_w/o, strengthen/weaken graph edge weights leading to that strategy. (Utility-based selection)
  1. Integration into RL Training:For each training query, extract top-k meta-strategies and prepend them to the prompt as policy prior, then train with GRPO.
Improved inference and RL training performance/convergence across 7 QA benchmarks.
 
 
 
arxiv.org
 
 

Recommendations