Gemma
Character reward
character level reward hacking 이 쉬웠다.
baseline
- 12.31%
0th
16.39%
1th
16.33
2th
17.96
20th
14.41%
universal mmlu 54.89%
./checkpoints/gemma2b_simpleqa_20_ppo_1e-05_0721_192159_30.0
universal critic 잘안맞음 select answer 차인듯
24th
11.61%
New metric token reward or prompt changing
baseline
27.94%
2th
- generated 27.99% 27.97%
- all 27.77% 27.75%
- naive 28.06% 28.01%
8th
- generation 28.33% -200
token 기반 매칭 without system prompt
16.32% → 16.86% 2th
Corrsteer
27.76%
LLama
- 6.72% basleine
- 6.76% 5th
Model-based reward
- 3.63%
corrsteer
- 3.80% global
Seonglae Cho