CRL MMLU

Creator

Creator

Seonglae Cho

Created

Created

2025 May 2 0:51

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Nov 26 13:18

Refs

Refs

gemma2b_mmlu_20_ppo_1e-05_0621_074052_30.0_sparse

gemma2b_mmlu_24_ppo_1e-05_0622_151551_30.0

Without selection

multi layer shared 코드라 다시해야할듯 only last k

그냥 norm 증가해서 영향만 주는거고 select_action 함수 학습 제대로 안되고 있는걸수도

double layer manipulation was highest but more than 3 rather lower than single layer manipulation and decreased

graph?

sae 더 큰 spase dictionary 사용 딱히 의미없

Control RL MMLU Models

Control RL MMLU Gemma

LLama

Baseline

white paper 66.7% 5 shot

base non-select 61.41% 0 shot

base select 61.42% 0 shot (almost non hallucination)

Single layer

30th 61.64% 10min

30th 61.69% 5min

30th 62.01% 5min decode

30th 61.62% 1min

30th 61.12% 20min

24th 57.66% 20min

24th 61.91 5 minimum

28th 61.86 1

62.33% 20th

Corrsteer

61.71%

meta-llama/Llama-3.1-8B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/meta-llama/Llama-3.1-8B

meta-llama/Llama-3.1-8B · Hugging Face

Recommendations

//////