Is Dask proper to read large csv files in parallel and split them into multiple smaller files?
Is Dask proper to read large csv files in parallel and split them into multiple smaller files?
Yes, dask can read large CSV files. It will split them into chunks
df = dd.read_csv("/path/to/myfile.csv")
Then, when saving, Dask always saves CSV data to multiple files
df.to_csv("/output/path/*.csv")
See the read_csv and to_csv docstrings for much more information about this.