Why codecs.iterdecode() eats empty strings?

2024/9/20 19:22:03

Why the following two decoding methods return different results?

>>> import codecs
>>>
>>> data = ['', '', 'a', '']
>>> list(codecs.iterdecode(data, 'utf-8'))
[u'a']
>>> [codecs.decode(i, 'utf-8') for i in data]
[u'', u'', u'a', u'']

Is this a bug or expected behavior? My Python version 2.7.13.

Answer

This is normal. iterdecode takes an iterator over encoded chunks and returns an iterator over decoded chunks, but it doesn't promise a one-to-one correspondence. All it guarantees is that the concatenation of all output chunks is a valid decoding of the concatenation of all input chunks.

If you look at the source code, you'll see it's explicitly discarding empty output chunks:

def iterdecode(iterator, encoding, errors='strict', **kwargs):"""Decoding iterator.Decodes the input strings from the iterator using an IncrementalDecoder.errors and kwargs are passed through to the IncrementalDecoderconstructor."""decoder = getincrementaldecoder(encoding)(errors, **kwargs)for input in iterator:output = decoder.decode(input)if output:yield outputoutput = decoder.decode("", True)if output:yield output

Be aware that the reason iterdecode exists, and the reason you wouldn't just call decode on all the chunks yourself, is that the decoding process is stateful. The UTF-8 encoded form of one character might be split over multiple chunks. Other codecs might have really weird stateful behavior, like maybe a byte sequence that inverts the case of all characters until you see that byte sequence again.

https://en.xdnf.cn/q/72317.html

Related Q&A

How to keep NaN in pivot table?

Looking to preserve NaN values when changing the shape of the dataframe.These two questions may be related:How to preserve NaN instead of filling with zeros in pivot table? How to make two NaN as NaN …

Using Pandas df.where on multiple columns produces unexpected NaN values

Given the DataFrameimport pandas as pddf = pd.DataFrame({transformed: [left, right, left, right],left_f: [1, 2, 3, 4],right_f: [10, 20, 30, 40],left_t: [-1, -2, -3, -4],right_t: [-10, -20, -30, -40], }…

Django star rating system and AJAX

I am trying to implement a star rating system on a Django site.Storing the ratings in my models is sorted, as is displaying the score on the page. But I want the users to be able to rate a page (from 1…

Create inheritance graphs/trees for Django templates

Is there any tool out there that would take a directory with a Django application, scan it for templates and draw/print/list a hierarchy of inheritance between templates?Seeing which blocks are being …

Python SVG converter creates empty file

I have some code below that is supposed to convert a SVG image to a PNG. It runs without errors but creates a PNG file that is blank instead of one with the same image as the original SVG. I did find t…

Fastest way to iterate through a pandas dataframe?

How do I run through a dataframe and return only the rows which meet a certain condition? This condition has to be tested on previous rows and columns. For example:#1 #2 #3 #4 1/1/1999 4 …

Constraints do not follow DCP rules in CVXPY

I want to solve this problem using CVXPY but I dont know why I get the following error message:DCPError: Problem does not follow DCP rules. I guess my constraints are not DCP. Is there any way to model…

is this betweenness calculation correct?

I try to calculate betweenness for all nodes for the path from 2 to 6 in this simple graph.G=nx.Graph() edge=[(1,5),(2,5),(3,5),(4,5),(4,6),(5,7),(7,6)] G.add_edges_from(edge) btw=nx.betweenness_centra…

Why does PIL thumbnail not resizing correctly?

I am trying to create and save a thumbnail image when saving the original user image in the userProfile model in my project, below is my code:def save(self, *args, **kwargs):super(UserProfile, self).sa…

Put the legend of pandas bar plot with secondary y axis in front of bars

I have a pandas DataFrame with a secondary y axis and I need a bar plot with the legend in front of the bars. Currently, one set of bars is in front of the legend. If possible, I would also like to pla…