Pretraining Dataset Methods
LLMs encode not only "what" they learn, but also "when" they learn it in their internal representations in a linear fashion.
Models can regurgitate pretraining datasets, fine-tuning data, and even RL datasets, and this probability increases after distillation. Knowledge Distillation indirectly acts as Dataset Distillation
Divergence attack (2023)
Repeated Token Phenomenon to extract Pretraining Dataset
Even modern large LLMs like ChatGPT allow extraction of training data (including PII) through simple prompts, and current alignment and safety techniques fundamentally fail to solve the memorization problem.

Seonglae Cho