Deep Causal TranscodingTranscoder but not reconstruct, but steer with extracting latent behavior vectorJailbreak vectoreven consider non-linear relationJacobianHessianExponential DCT Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models — LessWrongBased off research performed in the MATS 5.1 extension program, under the mentorship of Alex Turner (TurnTrout). Research supported by a grant from t…https://www.lesswrong.com/posts/fSRg5qs9TPbNy3sm5/deep-causal-transcoding-a-framework-for-mechanistically