The PileThe Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together. The Pile is hosted by the Eye. Have a model that uses or evaluates on the Pile? Let us know!https://pile.eleuther.ai/