Parquet

Creator
Creator
Seonglae Cho
Created
Created
2023 Oct 19 17:22
Editor
Edited
Edited
2025 May 26 15:43
Refs
Refs
Hadoop
zstd

Columnar data storage not row-based unlike
RDBMS
(PAA + kay)

So
Data Compression
rate is high (due to similar data types in columns)
When writing Parquet files, all columns automatically allow null values for compatibility
Since columns store the same data type, each column can use encoding methods optimized for its specific data type
Parquet Notion
 
 
 
Parquet Tools
 
 
 
 
A 42 kB Parquet file can contain over 4 PB of data.
 
 

Recommendations