Chunked reward model

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 21 16:52
Editor
Edited
Edited
2025 Mar 21 16:53
Refs
Refs
not whole results of output with single-horizon
  • token-level reward model
    • hidden state
  • window-based
 
 
 
 
 
 
 
 

Recommendations