Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/
Minimality Interpretability
Search

Minimality Interpretability

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 28 14:2
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Jan 28 14:3
Refs
Refs
The decomposition should use as few components as possible to replicate the network’s behavior on its training distribution
 
 
 
 
 
Interpretability in Parameter Space: Minimizing Mechanistic...
Mechanistic interpretability aims to understand the internal mechanisms learned by neural networks. Despite recent progress toward this goal, it remains unclear how best to decompose neural...
Interpretability in Parameter Space: Minimizing Mechanistic...
https://publications.apolloresearch.ai/apd
Interpretability in Parameter Space: Minimizing Mechanistic...
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/
Minimality Interpretability
Copyright Seonglae Cho