Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/Pretraining/Pretraining Dataset/
Pretraining Dataset Filtering
Search

Pretraining Dataset Filtering

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Aug 13 14:49
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Aug 25 21:25
Refs
Refs
Machine Unlearning
 
 
 
Pretraining Data Filtering for Open-Weight AI Safety
Announcing Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
Pretraining Data Filtering for Open-Weight AI Safety
https://blog.eleuther.ai/deep-ignorance/
Constitutional Classifier
Enhancing Model Safety through Pretraining Data Filtering
AI systems trained on internet-scale data can provide users with comprehensive knowledge on an immense range of topics. However, this wealth of information also includes sensitive information that could be dangerous if misused. For example, information related to chemical, biological, radiological and nuclear (CBRN) weapons could, in the wrong hands, enable bad actors with basic technical backgrounds to develop weapons of mass destruction. Our Responsible Scaling Policy (RSP) commits us to mitigating risks from such threat models and limiting the spread of harmful information by our models.
https://alignment.anthropic.com/2025/pretraining-data-filtering
 
 

Backlinks

Defense Jailbreaking

Recommendations

Texonom
Texonom
/
Science
Science
/Mathematics/Math Field/Statistics/Statistical Model/Model Generalization/Model Training/Pretraining/Pretraining Dataset/
Pretraining Dataset Filtering
Copyright Seonglae Cho