Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains the model’s decision process on that task.
Causal abstraction is a theoretical framework that examines whether a homomorphism exists between low-level variables (neurons, heads) → high-level conceptual variables
Causal abstractions
Deterministic Causal Model
A (deterministic) causal model with components is a quadruple where is a set of hidden variables such that , is an input variable, and defines a partial ordering over .
Hidden variables are intermediate computation results or activations that indicate the model state and represent nodes in the computational graph. Functions are considered as edges with a partial order that connects them to form a DAG computational graph.
For each
where and are the parents of .
Non‑linear Representation Dilemma
Causal abstraction uses an alignment map to connect model hidden states ↔ intermediate variables of an algorithm. However, the definition itself does not restrict the map to be linear. It is shown that if the map is made sufficiently powerful, almost any model can be aligned with almost any algorithm through 'intervention-consistency' (including existence/learning experiments).

Seonglae Cho