AI Monitoring
AI Auditing Methods
Internet Comment Automatic bot detection and blocking cases
Detecting and Countering Malicious Uses of Claude
https://www.anthropic.com/news/detecting-and-countering-malicious-uses-of-claude-march-2025

Position Paper
Not only Deliberative Alignment cot think too. CoT monitoring is significantly more effective than monitoring outputs/actions alone. Current-scale RL training does not significantly harm CoT monitoring. CoT monitoring is not a replacement but a complement to mechanistic interpretability.

Seonglae Cho