How to load parquet to datasets without caching?
Hi! datasets.set_caching_enabled(False) only affects the arrow files created via .map, not load_dataset. Also, parquet files cannot be zero-copied/memory-mapped efficiently (see Reading and Writing the Apache Parquet Format — Apache Arrow v8.0.0), so the arrow conversion is the only option for big datasets. Still, if you have enough RAM and want to skip this step to avoid generating a cache file, you can create an in-memory dataset directly from parquet as follows: from datasets import dataset ...
https://discuss.huggingface.co/t/how-to-load-parquet-to-datasets-without-caching/19564/2