Texonom
Texonom
/
Application
Application
/Data Science/Data Storage Layer/Storage File Format/Parquet/
Pyarrow
Search

Pyarrow

Created
Created
2023 Oct 23 6:48
Editor
Editor
Seonglae ChoSeonglae Cho
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Oct 16 14:57
Refs
Refs
Apache Arrow

Huggingface Datasets

 
 
 
 
How to load parquet to datasets without caching?
Hi! datasets.set_caching_enabled(False) only affects the arrow files created via .map, not load_dataset. Also, parquet files cannot be zero-copied/memory-mapped efficiently (see Reading and Writing the Apache Parquet Format — Apache Arrow v8.0.0), so the arrow conversion is the only option for big datasets. Still, if you have enough RAM and want to skip this step to avoid generating a cache file, you can create an in-memory dataset directly from parquet as follows: from datasets import dataset ...
How to load parquet to datasets without caching?
https://discuss.huggingface.co/t/how-to-load-parquet-to-datasets-without-caching/19564/2
How to load parquet to datasets without caching?
 
 
 

Recommendations

Texonom
Texonom
/
Application
Application
/Data Science/Data Storage Layer/Storage File Format/Parquet/
Pyarrow
Copyright Seonglae Cho