Personal UCL Thesis Plan

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Aug 25 17:29
Editor
Edited
Edited
2025 Sep 11 17:47
Refs
Refs

Motivation

  • brain analogy
  • emergent misalignment

Research Objective

  • Task Circuit Discovery
  • RLVR
  • Practical Interpretability

Scientific Contributions

  • Extending Steering
  • Training Method

Method

rl amplify with citation

Results

 

Future Works

  • pretraining based on follow activation making non selected
  • local corrsteer like grpo based normalization estimation correlation
  • Token entropy reward
 
 
 

Recommendations