Interpretability
Degree to which a model can be understood in human terms
Model inspection only provides information about the model. The model might not accurately reflect the data
Interpretability paradigms offer distinct lenses for understanding neural networks: Behavioral analyzes input-output relations; Attributional quantifies individual input feature influences; Concept-based identifies high-level representations governing behavior; Mechanistic uncovers precise causal mechanisms from inputs to outputs.
Interpretable AI Notion
Explainable AI Methods