PRM800K

Creator

Creator

Seonglae Cho

Created

Created

2022 Feb 21 13:51

Editor

Editor

Seonglae Cho

Edited

Edited

2023 Oct 16 12:28

Refs

Refs

Open ai Let’s Verify Step by Step

Improving mathematical reasoning with process supervision

Improving mathematical reasoning with process supervision

We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”). In addition to boosting performance relative to outcome supervision, process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans.

https://openai.com/research/improving-mathematical-reasoning-with-process-supervision

Improving mathematical reasoning with process supervision

Recommendations

///////