RedPajama-Datatogethercomputer • Updated 2023 Jun 13 5:20 togethercomputer/RedPajama-Data-1T · Datasets at Hugging FaceWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1TRedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens — TOGETHERRedPajama is a project to create a set of leading, fully open-source models. Today, we are excited to announce the completion of the first step of this project: the reproduction of the LLaMA training dataset of over 1.2 trillion tokens.https://www.together.xyz/blog/redpajama