Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Data/Dataset/NLP Dataset/Web Dataset/
FineWeb
Search

FineWeb

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jun 6 17:8
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Mar 2 15:33
Refs
Refs
Size
Size
GB
Multilingual
Multilingual
Multilingual

Huggingface

엄선 데이터셋으로 웹 데이터셋 중에서 퀄리티 좋다
 
 
 
 
 
huggingface.co
https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
FineWeb-Edu
Andrej Karpathy on Twitter / X
Awesome and highly useful: FineWeb-Edu 📚👏High quality LLM dataset filtering the original 15 trillion FineWeb tokens to 1.3 trillion of the highest (educational) quality, as judged by a Llama 3 70B. +A highly detailed paper.Turns out that LLMs learn a lot better and faster… https://t.co/f3wqPbNkJ5 pic.twitter.com/9nXaet5tmG— Andrej Karpathy (@karpathy) June 2, 2024
Andrej Karpathy on Twitter / X
https://x.com/karpathy/status/1797313173449764933

AI Education
Classifier

HuggingFaceFW/fineweb-edu-classifier · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
HuggingFaceFW/fineweb-edu-classifier · Hugging Face
https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier
HuggingFaceFW/fineweb-edu-classifier · Hugging Face
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Data/Dataset/NLP Dataset/Web Dataset/
FineWeb
Copyright Seonglae Cho