Question 1

I have a large number of tasks that I want to execute and make the results available via a generator. However, using a ProcessPoolExecutor and as_completed will evaluate the results greedily and store them all in memory. Is there a way to block after a certain number of results are stored in the generator?

Question 2

The idea for this is to split what you want to process in chunks, I'll be using almost the same example than in the ProcessPoolExecutor documentation:

import concurrent.futures
import math
import itertools as itPRIMES = [293,171,293,773,99,5419,293,171,293,773,99,5419,293,171,293,773,99,5419]def is_prime(n):if n % 2 == 0:return Falsesqrt_n = int(math.floor(math.sqrt(n)))for i in range(3, sqrt_n + 1, 2):if n % i == 0:return Falsereturn Truedef main():with concurrent.futures.ProcessPoolExecutor() as executor:for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):print('%d is prime: %s' % (number, prime))def main_lazy():chunks = map(lambda x: it.islice(PRIMES, x, x+4), range(0, len(PRIMES), 4))with concurrent.futures.ProcessPoolExecutor() as executor:results = zip(PRIMES, it.chain.from_iterable(map(lambda x: executor.map(is_prime, x), chunks)))for number, prime in (next(results) for _ in range(4)):print('%d is prime: %s' % (number, prime))if __name__ == "__main__":main_lazy()

Notice the differences between main and main_lazy, let's explain this a bit:

Instead of having a list of all what we want to process I split it into chunks of size 4 (it's useful to use itertools.islice), the idea is that instead of mapping with the executor the whole list we will be mapping the chunks. Then just using python3 lazy map we can map that executor call lazily to each of the chunks. So, we know that executor.map is not lazy so that chunk will be evaluated immediately when we request it, but till we don't request the other chunks the executor.map for that chunks will not be called. As you can see I'm only requesting the first 4 elements from the whole list of results, but since I also used itertools.chain it will just consume the ones from the first chunk, without calculating the rest of the iterable.

So, since you wanted to return a generator, it would be as easy as return the results from the main_lazy function, you can even abstract the chunk size (probably you would need a good function to get the propper chunks, but this is out of scope):

def main_lazy(chunk_size):chunks = map(lambda x: it.islice(PRIMES, x, x+chunk_size), range(0, len(PRIMES), chunk_size))with concurrent.futures.ProcessPoolExecutor() as executor:results = zip(PRIMES, it.chain.from_iterable(map(lambda x: executor.map(is_prime, x), chunks)))return results

lazy processpoolexecutor in Python?

Related Q&A

error occurs when installing cryptography for scrapy in virtualenv on OS X [closed]

Can Python do DI seamlessly without relying on a service locator?

Producing pdf report from python with bullet points

Converting bits to bytes in Python

Install python package from private pypiserver

Matplotlib Table- Assign different text alignments to different columns

How would I make a random hexdigit code generator using .join and for loops?

Multiline python regex

How can I process xml asynchronously in python?

python postgresql: reliably check for updates in a specific table