Python groupby doesnt work as expected [duplicate]

2024/10/6 4:09:39

I am trying to read an excel spreadsheet that contains some columns in following format:

column1__
column1__AccountName
column1__SomeOtherFeature
column2__blabla
column2_SecondFeat

I've already saved values of one row as list of tuples, where is tuple is (column_name, column_value) in variable x.

Now I would like to group it like this:

result = { 'column__1': [list of (k,v) tuples, which keys start with 'column__1'],'column__2': [list of (k,v) tuples, which keys start with 'column__2']
}

But it doesn't give expected result:

>>> from itertools import groupby>>> x
[(u'My field one__AccountName', u'Lorem ipsum bla bla bla'),(u'My field one__AccountNumber', u'1111111222255555'),(u'My field two__Num', u'Num: 612312345'),(u'My field two', u'asdasdafassda'),(u'My field three__Beneficiary International Bank Account Number IBAN',u'IE111111111111111111111'),(u'My field one__BIC', u'BLEAHBLA1'),(u'My field three__company name', u'Company XYZ'),(u'My field two__BIC', u'ASDF333')]>>> groups = groupby(x ,lambda (field, val): field.split('__')[0])>>> grouped_fields = {key: list(val) for key, val in groups}>>> grouped_fields{u'My field one': [(u'My field one__BIC', u'BLEAHBLA1')],u'My field three': [(u'My field three__company name', u'Company XYZ')],u'My field two': [(u'My field two__BIC', u'ASDF333')]}>>> x[0]
(u'My field one__AccountName', u'Lorem ipsum bla bla bla')>>> x[1]
(u'My field one__AccountNumber', u'1111111222255555')>>> x[0][0].split('__')[0] == x[1][0].split('__')[0]
True

However it seems to work with another instance of initial list:

>>> y = [(u'x a b__2', 3), (u'x a b__', 1), (u'x a b__1', 2), (u'y a__1', 1), (u'y a__2', 2)]>>> y
[(u'x__2', 3), (u'x__', 1), (u'x__1', 2), (u'y__1', 1), (u'y__2', 2)]>>> groupes_y = groupby(y, lambda (k,v): k.split('__')[0])>>> grouped_y = {key:list(val) for key, val in groupes_y}>>> grouped_y{u'x': [(u'x__2', 3), (u'x__', 1), (u'x__1', 2)],u'y': [(u'y__1', 1), (u'y__2', 2)]}

No idea what I am doing wrong.

Answer

As the docs say, you are supposed to apply groupby to a list which is already sorted using the same key as groupby itself:

key = lambda fv: fv[0].split('__')[0]
groups = groupby(sorted(x, key=key), key=key)

Then grouped_fields is:

{u'My field one': [(u'My field one__AccountName', u'Lorem ipsum bla bla bla'),(u'My field one__AccountNumber', u'1111111222255555'),(u'My field one__BIC', u'BLEAHBLA1')],u'My field three': [(u'My field three__Beneficiary International Bank Account Number IBAN',u'IE111111111111111111111'),(u'My field three__company name', u'Company XYZ')],u'My field two': [(u'My field two__Num', u'Num: 612312345'),(u'My field two', u'asdasdafassda'),(u'My field two__BIC', u'ASDF333')]}

In your second example, it happens that y is already sorted:

>>> y == sorted(y, key=key)
True
https://en.xdnf.cn/q/70412.html

Related Q&A

Dask: create strictly increasing index

As is well documented, Dask creates a strictly increasing index on a per partition basis when reset_index is called, resulting in duplicate indices over the whole set. What is the best way (e.g. comput…

Installing hunspell package

Im looking forward to install the hunspell package using pip, but it throws the following error:Collecting hunspellUsing cached hunspell-0.4.1.tar.gz Building wheels for collected packages: hunspellRun…

Flask-Restful taking over exception handling from Flask during non debug mode

Ive used Flasks exception handling during development (@app.errorhander(MyException)) which worked fine even for exceptions coming from Flask-Restful endpoints.However, I noticed that when switching to…

Fetching data with snowflake connector throws EmptyPyArrowIterator error

I use python snowflake connector in my python script (plotly dash app) and today the app stopped working without me changing the code. I tried a couple of things to find out what might be the issue and…

What does epochs mean in Doc2Vec and train when I have to manually run the iteration?

I am trying to understand the epochs parameter in the Doc2Vec function and epochs parameter in the train function. In the following code snippet, I manually set up a loop of 4000 iterations. Is it requ…

TensorFlow 2.0 How to get trainable variables from tf.keras.layers layers, like Conv2D or Dense

I have been trying to get the trainable variables from my layers and cant figure out a way to make it work. So here is what I have tried:I have tried accessing the kernel and bias attribute of the Dens…

Convert Excel row,column indices to alphanumeric cell reference in python/openpyxl

I want to convert the row and column indices into an Excel alphanumeric cell reference like A1. Im using python and openpyxl, and I suspect theres a utility somewhere in that package that does this, bu…

Flask-admin - how to change formatting of columns - get URLs to display

Question on flask-admin. I setup flask-admin and one of the models i created is pulling urls and url titles from a mysql database. Using flask-admin, how to i get flask-admin to render the urls instea…

Stream audio from pyaudio with Flask to HTML5

I want to stream the audio of my microphone (that is being recorded via pyaudio) via Flask to any client that connects.This is where the audio comes from:def getSound(self):# Current chunk of audio dat…

Adding into Path var while silent installation of Python - possible bug?

I need to passively install Python in my applications package installation so i use the following:python-3.5.4-amd64.exe /passive PrependPath=1according this: 3.1.4. Installing Without UI I use the Pre…