Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/Transcoder/
DCT
Search

DCT

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 6 17:36
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 6 17:40
Refs
Refs

Deep Causal Transcoding

Transcoder but not reconstruct, but steer with extracting latent behavior vector
  • Jailbreak vector
even consider non-linear relation
  • Jacobian
  • Hessian
  • Exponential DCT
 
 
 
Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models — LessWrong
Based off research performed in the MATS 5.1 extension program, under the mentorship of Alex Turner (TurnTrout). Research supported by a grant from t…
Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models — LessWrong
https://www.lesswrong.com/posts/fSRg5qs9TPbNy3sm5/deep-causal-transcoding-a-framework-for-mechanistically
Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models — LessWrong
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/Transcoder/
DCT
Copyright Seonglae Cho