concurrent.futures not parallelizing write

2024/10/14 11:17:35

I have a list dataframe_chunk which contains chunks of a very large pandas dataframe.I would like to write every single chunk into a different csv, and to do so in parallel. However, I see the files being written sequentially and I'm not sure why this is the case. Here's the code:

import concurrent.futures as cfudef write_chunk_to_file(chunk, fpath):  chunk.to_csv(fpath, sep=',', header=False, index=False)pool = cfu.ThreadPoolExecutor(N_CORES)futures = []
for i in range(N_CORES):fpath = '/path_to_files_'+str(i)+'.csv'futures.append(pool.submit( write_chunk_to_file(dataframe_chunk[i], fpath)))for f in cfu.as_completed(futures):print("finished at ",time.time())

Any clues?

Answer

One thing that is stated in the Python 2.7.x threading docs but not in the 3.x docs is that Python cannot achieve true parallelism using the threading library - only one thread will execute at a time.

You should try using concurrent.futures with the ProcessPoolExecutor which uses separate processes for each job and therefore can achieve true parallelism on a multi-core CPU.

Update

Here is your program adapted to use the multiprocessing library instead:

#!/usr/bin/env python3from multiprocessing import Processimport os
import timeN_CORES = 8def write_chunk_to_file(chunk, fpath):  with open(fpath, "w") as f:for x in range(10000000):f.write(str(x))futures = []print("my pid:", os.getpid())
input("Hit return to start:")start = time.time()
print("Started at:", start)for i in range(N_CORES):fpath = './tmp/file-'+str(i)+'.csv'p = Process(target=write_chunk_to_file, args=(i,fpath))futures.append(p)for p in futures:p.start()print("All jobs started.")for p in futures:p.join()print("All jobs finished at ",time.time())

You can monitor the jobs with this shell command in another window:

while true; do clear; pstree 12345; ls -l tmp; sleep 1; done

(Replace 12345 with the pid emitted by the script.)

https://en.xdnf.cn/q/117963.html

Related Q&A

Querying SQLite database file in Google Colab

print (Files in Drive:)!ls drive/AIFiles in Drive:database.sqlite Reviews.csv Untitled0.ipynb fine_food_reviews.ipynb Titanic.csvWhen I run the above code in Google Colab, clearly my sqlite file is pre…

AttributeError: function object has no attribute self

I have a gui file and I designed it with qtdesigner, and there are another py file. I tried to changing button name or tried to add item in listwidget but I didnt make that things. I got an error messa…

Find file with largest number in filename in each sub-directory with python?

I am trying to find the file with the largest number in the filename in each subdirectory. This is so I can acomplish opening the most recent file in each subdirectory. Each file will follow the namin…

Selenium Python - selecting from a list on the web with no stored/embedded options

Im very new to Python so forgive me if this isnt completely comprehensible. Im trying to select from a combobox in a webpage. All the examples Ive seen online are choosing from a list where the options…

How to use a method in a class from another class that inherits from yet another class python

I have 3 classes :class Scene(object):def enter(self):passclass CentralCorridor(Scene):def enter(self):passclass Map(object):def __init__(self, start_game): passAnd the class map is initiated like this…

Finding common IDs (intersection) in two dictionaries

I wrote a piece of code that is supposed to find common intersecting IDs in line[1] in two different files. On my small sample files it works OK, but on my bigger files does not. I cannot figure out wh…

Run command line containing multiple strings from python script

Hello i am trying to autogenerate a PDF, i have made a python script that generates the wanted PDF but to generate it i have to call my_cover.py -s "Atsumi" -t "GE1.5s" -co "Ja…

Identify value across multiple columns in a dataframe that contain string from a list in python

I have a dataframe with multiple columns containing phrases. What I would like to do is identify the column (per row observation) that contains a string that exists within a pre-made list of words. Wi…

ipython like interpreter for ruby

I come from python background and am learning ruby. IPython is really awesome. I am new to ruby now, and wanted to have some sort of ipython things. As of now am having tough time, going along ruby lin…

Django dynamic verification form

Im trying to create a verification form in Django that presents a user with a list of choices, only one of which is valid.For example, for a user whose favourite pizza toppings includes pineapple and r…