Anthropic Research

Creator

Creator

Seonglae Cho

Created

Created

2025 Dec 13 18:15

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Dec 13 18:18

Refs

Refs

69 activates neural network

Interpretability: Understanding how AI models think

What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically? Join Anthropic's Josh Batson, Emmanuel Ameisen, and Jack Lindsey as they discuss the latest research on AI interpretability. Read more about Anthropic's interpretability research: https://www.anthropic.com/news/tracing-thoughts-language-model Sections: Introduction [00:00] The biology of AI models [01:37] Scientific methods to open the black box [6:43] Some surprising features inside Claude's mind [10:35] Can we trust what a model claims it's thinking? [20:39] Why do AI models hallucinate? [25:17] AI models planning ahead [34:15] Why interpretability matters [38:30] The future of interpretability [53:35]

Interpretability: Understanding how AI models think

https://www.youtube.com/watch?v=fGKNUvivvnc

Interpretability: Understanding how AI models think

Recommendations

//////