Interpretable AI

Creator
Creator
Seonglae Cho
Created
Created
2024 May 1 1:17
Editor
Edited
Edited
2025 Mar 4 15:26

Interpretability

Degree to which a model can be understood in human terms
Model inspection only provides information about the model. The model might not accurately reflect the data
Explaining the modelExplaining the data\text{Explaining the model} ≠ \text{Explaining the data}
Interpretability paradigms offer distinct lenses for understanding neural networks: Behavioral analyzes input-output relations; Attributional quantifies individual input feature influences; Concept-based identifies high-level representations governing behavior; Mechanistic uncovers precise causal mechanisms from inputs to outputs.
https://arxiv.org/pdf/2404.14082
Interpretable AI Notion
 
 
 
Explainable AI Methods
 
 
 
 

Dream

 
 

Recommendations