Andrej Karpathy on Twitter / X
Awesome and highly useful: FineWeb-Edu 📚👏High quality LLM dataset filtering the original 15 trillion FineWeb tokens to 1.3 trillion of the highest (educational) quality, as judged by a Llama 3 70B. +A highly detailed paper.Turns out that LLMs learn a lot better and faster… https://t.co/f3wqPbNkJ5 pic.twitter.com/9nXaet5tmG— Andrej Karpathy (@karpathy) June 2, 2024
https://x.com/karpathy/status/1797313173449764933