Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Reinforcement Learning/Reinforcement Learning Method/Policy Gradient Learning/
PRIME RL
Search

PRIME RL

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 15 15:36
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Jan 20 12:5
Refs
Refs
On-policy
PPO
PRIME
PRIME-RL • Updated 2025 Jan 19 16:32
Reward model

Process Reinforcement Through Implicit Rewards

Implicit PRM

process reward modeling
dense reward per token
🍃
Process Reinforcement through Implicit Rewards
 
 
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Reinforcement Learning/Reinforcement Learning Method/Policy Gradient Learning/
PRIME RL
Copyright Seonglae Cho