CRL MMLU

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 May 2 0:51
Editor
Edited
Edited
2025 Nov 26 13:18
Refs
Refs
  • gemma2b_mmlu_20_ppo_1e-05_0621_074052_30.0_sparse
  • gemma2b_mmlu_24_ppo_1e-05_0622_151551_30.0

Without selection

multi layer shared 코드라 다시해야할듯 only last k
그냥 norm 증가해서 영향만 주는거고 select_action 함수 학습 제대로 안되고 있는걸수도
double layer manipulation was highest but more than 3 rather lower than single layer manipulation and decreased
graph?
  • sae 더 큰 spase dictionary 사용 딱히 의미없
Control RL MMLU Models
 
 

LLama

Baseline

  • white paper 66.7% 5 shot
  • base non-select 61.41% 0 shot
  • base select 61.42% 0 shot (almost non hallucination)

Single layer

  • 30th 61.64% 10min
  • 30th 61.69% 5min
  • 30th 62.01% 5min decode
  • 30th 61.62% 1min
  • 30th 61.12% 20min
  • 24th 57.66% 20min
  • 24th 61.91 5 minimum
  • 28th 61.86 1
  • 62.33% 20th

Corrsteer

  • 61.71%
 
 
 
meta-llama/Llama-3.1-8B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
meta-llama/Llama-3.1-8B · Hugging Face
 
 

Recommendations