Enhancing Model Safety through Pretraining Data Filtering
AI systems trained on internet-scale data can provide users with comprehensive knowledge on an immense range of topics. However, this wealth of information also includes sensitive information that could be dangerous if misused. For example, information related to chemical, biological, radiological and nuclear (CBRN) weapons could, in the wrong hands, enable bad actors with basic technical backgrounds to develop weapons of mass destruction. Our Responsible Scaling Policy (RSP) commits us to mitigating risks from such threat models and limiting the spread of harmful information by our models.
https://alignment.anthropic.com/2025/pretraining-data-filtering