Interpretable AI

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 1 1:17
Editor
Edited
Edited
2024 Nov 27 16:45
Refs
Refs
AI Safety

Interpretability

Degree to which a model can be understood in human terms
https://arxiv.org/pdf/2404.14082
Interpretability paradigms offer distinct lenses for understanding neural networks: Behavioral analyzes input-output relations; Attributional quantifies individual input feature influences; Concept-based identifies high-level representations governing behavior; Mechanistic uncovers precise causal mechanisms from inputs to outputs.
Interpretable AI Notion
 
 
 
Explainable AI Methods
 
 
 
 
 
 
 
 

Recommendations