The
datasets.map() function provides several useful parameters for data processing:num_proc- Enables parallel processing using python multiprocessing, which can significantly speed up the mapping operation by distributing work across multiple CPU cores
remove_columns- Accepts a list of column names to remove from the dataset after the mapping function is applied, helping to reduce memory usage and keep only relevant columns

Seonglae Cho