python: is there a library function for chunking an input stream?

2024/10/18 6:23:47

I want to chunk an input stream for batch processing. Given an input list or generator,

x_in = [1, 2, 3, 4, 5, 6 ...]

I want a function that will return chunks of that input. Say, if chunk_size=4, then,

x_chunked = [[1, 2, 3, 4], [5, 6, ...], ...]

This is something I do over and over, and was wondering if there is a more standard way than writing it myself. Am I missing something in itertools? (One could solve the problem with enumerate and groupby, but that feels clunky.) In case anyone wants to see an implementation, here it is,

def chunk_input_stream(input_stream, chunk_size):"""partition a generator in a streaming fashion"""assert chunk_size >= 1accumulator = []for x in input_stream:accumulator.append(x)if len(accumulator) == chunk_size:yield accumulatoraccumulator = []if accumulator:yield accumulator

Edit

Inspired by kreativitea's answer, here's a solution with islice, which is straightforward & doesn't require post-filtering,

from itertools import islicedef chunk_input_stream(input_stream, chunk_size):while True:chunk = list(islice(input_stream, chunk_size))if chunk:yield chunkelse:return# test it with list(chunk_input_stream(iter([1, 2, 3, 4]), 3))
Answer

The recipe from itertools:

def grouper(n, iterable, fillvalue=None):"Collect data into fixed-length chunks or blocks"# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxxargs = [iter(iterable)] * nreturn izip_longest(fillvalue=fillvalue, *args)
https://en.xdnf.cn/q/72731.html

Related Q&A

jinja2: How to make it fail Silently like djangotemplate

Well i dont find the answer Im sure that its very simple, but i just dont find out how to make it work like Django when it doesnt find a variablei tried to use Undefined and create my own undefined but…

ImportError when from transformers import BertTokenizer

My code is: import torch from transformers import BertTokenizer from IPython.display import clear_outputI got error in line from transformers import BertTokenizer: ImportError: /lib/x86_64-linux-gnu/li…

How to get feature names of shap_values from TreeExplainer?

I am doing a shap tutorial, and attempting to get the shap values for each person in a dataset from sklearn.model_selection import train_test_split import xgboost import shap import numpy as np import …

How can I clear a line in console after using \r and printing some text?

For my current project, there are some pieces of code that are slow and which I cant make faster. To get some feedback how much was done / has to be done, Ive created a progress snippet which you can s…

installing pyaudio to docker container

I am trying to install pyaudio to my docker container and I was wondering if anyone had any solution for Windows. I have tried two methods: Method 1: Using pipwin - Error Code: => [3/7] RUN pip inst…

Escaping special characters in elasticsearch

I am using the elasticsearch python client to make some queries to the elasticsearch instance that we are hosting.I noticed that some characters need to be escaped. Specifically, these...+ - &&…

Interacting with live matplotlib plot

Im trying to create a live plot which updates as more data is available.import os,sys import matplotlib.pyplot as pltimport time import randomdef live_plot():fig = plt.figure()ax = fig.add_subplot(111)…

pandas groupby: can I select an agg function by one level of a column MultiIndex?

I have a pandas DataFrame with a MultiIndex of columns:columns=pd.MultiIndex.from_tuples([(c, i) for c in [a, b] for i in range(3)]) df = pd.DataFrame(np.random.randn(4, 6),index=[0, 0, 1, 1],columns=c…

Bottle web app not serving static css files

My bottle web application is not serving my main.css file despite the fact I am using the static_file method.app.pyfrom bottle import * from xml.dom import minidom @route(/) def index():return template…

How to wrap text in OpenCV when I print it on an image and it exceeds the frame of the image?

I have a 1:1 ratio image and I want to make sure that if the text exceeds the frame of the image, it gets wrapped to the next line. How would I do it?I am thinking of doing an if-else block, where &qu…