Only Portion of pretraining How to avoid collapese: ToEdit (token level edit)arxiv.orghttps://arxiv.org/pdf/2412.14689Phi Cosmopedia for SmolAgents LM useful to On-device AI small llmCosmopedia: how to create large-scale synthetic data for pre-training Large Language ModelsWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/blog/cosmopedia