Simplicity Interpretability

Creator

Creator

Seonglae Cho

Created

Created

2025 Jan 28 14:3

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jan 28 14:3

Refs

Refs

Components should each involve as little computational machinery as possible.

Interpretability in Parameter Space: Minimizing Mechanistic...

Mechanistic interpretability aims to understand the internal mechanisms learned by neural networks. Despite recent progress toward this goal, it remains unclear how best to decompose neural...

https://publications.apolloresearch.ai/apd

Recommendations

////////