Self-improving without label
SL(Supervised Learning) phase

RL(Reinforcement Learning) phase
Anthropic on Twitter / X
In our paper, we describe how we’ve used Constitutional AI to train better and more harmless AI assistants without any human feedback labels for harms. This approach leads to models that are safer and also more helpful. pic.twitter.com/PXvWk3fz0o— Anthropic (@AnthropicAI) December 16, 2022
https://twitter.com/AnthropicAI/status/1603791168495489030
Constitutional AI: Harmlessness from AI Feedback
We show that language models can learn to follow a set of simple, natural language principles via self-improvement, and we use this new method to train a more harmless assistant.
https://www.anthropic.com/index/constitutional-ai-harmlessness-from-ai-feedback

Constitutional Classifiers from Anthropic AI Constitutional AI
Heuristic rules

Seonglae Cho