Texonom
/
Engineering
/
Data Engineering
/
Artificial Intelligence
/
AI Data
/
Dataset
/
NLP Dataset
/
Web Dataset
/
CC-100
Search
CC-100
Created
Created
2024 Nov 2 0:0
Editor
Editor
Seonglae Cho
Creator
Creator
Seonglae Cho
Edited
Edited
2024 Nov 2 0:1
Refs
Refs
C4
is comparably-sized to
The Pile
, while mC4 and
CC-100
are larger, multilingual datasets
Recommendations
Texonom
/
Engineering
/
Data Engineering
/
Artificial Intelligence
/
AI Data
/
Dataset
/
NLP Dataset
/
Web Dataset
/
CC-100