Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Invent/Math AI/
PRM800K
Search

PRM800K

Creator
Creator
Seonglae Cho
Created
Created
2022 Feb 21 13:51
Editor
Editor
Seonglae Cho
Edited
Edited
2023 Oct 16 12:28
Refs
Refs
GSM8K
RLHF

Open ai Let’s Verify Step by Step

Improving mathematical reasoning with process supervision
 
 
 
 
 
 
Improving mathematical reasoning with process supervision
We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”). In addition to boosting performance relative to outcome supervision, process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans.
Improving mathematical reasoning with process supervision
https://openai.com/research/improving-mathematical-reasoning-with-process-supervision
Improving mathematical reasoning with process supervision
 
 

Backlinks

Process Supervision

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Invent/Math AI/
PRM800K
Copyright Seonglae Cho