Causal abstraction

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Nov 10 18:17
Editor
Edited
Edited
2025 Dec 31 1:37
Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains the model’s decision process on that task.
Causal abstraction is a theoretical framework that examines whether a homomorphism exists between low-level variables (neurons, heads) → high-level conceptual variables
Causal abstractions
 
 
 
 

Deterministic Causal Model

A (deterministic) causal model with components is a quadruple where is a set of hidden variables such that , is an input variable, and defines a partial ordering over .
Hidden variables are intermediate computation results or activations that indicate the model state and represent nodes in the computational graph. Functions are considered as edges with a partial order that connects them to form a
DAG
computational graph.
For each
where and are the parents of .

Non‑linear Representation Dilemma

Causal abstraction uses an alignment map to connect model hidden states ↔ intermediate variables of an algorithm. However, the definition itself does not restrict the map to be linear. It is shown that if the map is made sufficiently powerful, almost any model can be aligned with almost any algorithm through 'intervention-consistency' (including existence/learning experiments).
 
 
 

Recommendations