Minimality Interpretability

Creator

Creator

Seonglae Cho

Created

Created

2025 Jan 28 14:2

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 24 23:0

Refs

Refs

Principle of Charity

The decomposition should use as few components as possible to replicate the network’s behavior on its training distribution

Interpretability in Parameter Space: Minimizing Mechanistic...

Mechanistic interpretability aims to understand the internal mechanisms learned by neural networks. Despite recent progress toward this goal, it remains unclear how best to decompose neural...

https://publications.apolloresearch.ai/apd

Recommendations

////////