Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Data/Dataset/NLP Dataset/Web Dataset/
OpenWebText
Search

OpenWebText

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 3 15:1
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Mar 2 15:27
Refs
Refs
Size
Size
GB
Multilingual
Multilingual
Multilingual
This is a dataset created by crawling high Karma (upvote count) posts' external webpage links from Reddit
WebText
모방으로 reddit 기반이라 퀄리티 그렇게 좋지는 않다.
 
 
 
 
Skylion007/openwebtext · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skylion007/openwebtext · Datasets at Hugging Face
https://huggingface.co/datasets/openwebtext
Skylion007/openwebtext · Datasets at Hugging Face
stas/openwebtext-10k · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
stas/openwebtext-10k · Datasets at Hugging Face
https://huggingface.co/datasets/stas/openwebtext-10k
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Data/Dataset/NLP Dataset/Web Dataset/
OpenWebText
Copyright Seonglae Cho