Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/AI Circuit/Causal abstraction/
Causal Scrubbing
Search

Causal Scrubbing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Dec 16 11:50
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Dec 16 11:52
Refs
Refs
 
 
 
 
 
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research] — AI Alignment Forum
Causal scrubbing is a new tool for evaluating mechanistic interpretability hypotheses. The algorithm tries to replace all model activations that shou…
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research] — AI Alignment Forum
https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a-method-for-rigorously-testing?utm_source=chatgpt.com
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research] — AI Alignment Forum
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/AI Circuit/Causal abstraction/
Causal Scrubbing
Copyright Seonglae Cho