CRL SimpleQA

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 21 17:33
Editor
Edited
Edited
2025 Aug 12 22:58
Refs
Refs

Gemma

Character reward

character level reward hacking 이 쉬웠다.

baseline

  • 12.31%

0th

16.39%

1th

16.33

2th

17.96

20th

14.41%
universal mmlu 54.89%
./checkpoints/gemma2b_simpleqa_20_ppo_1e-05_0721_192159_30.0
universal critic 잘안맞음 select answer 차인듯

24th

11.61%

New metric token reward or prompt changing

baseline

27.94%

2th

  • generated 27.99% 27.97%
  • all 27.77% 27.75%
  • naive 28.06% 28.01%

8th

  • generation 28.33% -200

token 기반 매칭 without system prompt

16.32% → 16.86% 2th

Corrsteer

27.76%

LLama

  • 6.72% basleine
  • 6.76% 5th

Model-based reward

  • 3.63%
corrsteer
  • 3.80% global
 
 
 
 
 
 

Recommendations