Pretraining Dataset Filtering

Creator

Creator

Seonglae Cho

Created

Created

2025 Aug 13 14:49

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 25 21:25

Refs

Refs

Machine Unlearning

Pretraining Data Filtering for Open-Weight AI Safety

Announcing Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

https://blog.eleuther.ai/deep-ignorance/

Constitutional Classifier

Enhancing Model Safety through Pretraining Data Filtering

AI systems trained on internet-scale data can provide users with comprehensive knowledge on an immense range of topics. However, this wealth of information also includes sensitive information that could be dangerous if misused. For example, information related to chemical, biological, radiological and nuclear (CBRN) weapons could, in the wrong hands, enable bad actors with basic technical backgrounds to develop weapons of mass destruction. Our Responsible Scaling Policy (RSP) commits us to mitigating risks from such threat models and limiting the spread of harmful information by our models.

https://alignment.anthropic.com/2025/pretraining-data-filtering

Backlinks

Defense Jailbreaking

Recommendations

//////////