I want to chunk an input stream for batch processing. Given an input list or generator,
x_in = [1, 2, 3, 4, 5, 6 ...]
I want a function that will return chunks of that input. Say, if chunk_size=4
, then,
x_chunked = [[1, 2, 3, 4], [5, 6, ...], ...]
This is something I do over and over, and was wondering if there is a more standard way than writing it myself. Am I missing something in itertools
? (One could solve the problem with enumerate
and groupby
, but that feels clunky.) In case anyone wants to see an implementation, here it is,
def chunk_input_stream(input_stream, chunk_size):"""partition a generator in a streaming fashion"""assert chunk_size >= 1accumulator = []for x in input_stream:accumulator.append(x)if len(accumulator) == chunk_size:yield accumulatoraccumulator = []if accumulator:yield accumulator
Edit
Inspired by kreativitea's answer, here's a solution with islice
, which is straightforward & doesn't require post-filtering,
from itertools import islicedef chunk_input_stream(input_stream, chunk_size):while True:chunk = list(islice(input_stream, chunk_size))if chunk:yield chunkelse:return# test it with list(chunk_input_stream(iter([1, 2, 3, 4]), 3))