Question 1

I am trying to read an excel spreadsheet that contains some columns in following format:

column1__
column1__AccountName
column1__SomeOtherFeature
column2__blabla
column2_SecondFeat

I've already saved values of one row as list of tuples, where is tuple is (column_name, column_value) in variable x.

Now I would like to group it like this:

result = { 'column__1': [list of (k,v) tuples, which keys start with 'column__1'],'column__2': [list of (k,v) tuples, which keys start with 'column__2']
}

But it doesn't give expected result:

>>> from itertools import groupby>>> x
[(u'My field one__AccountName', u'Lorem ipsum bla bla bla'),(u'My field one__AccountNumber', u'1111111222255555'),(u'My field two__Num', u'Num: 612312345'),(u'My field two', u'asdasdafassda'),(u'My field three__Beneficiary International Bank Account Number IBAN',u'IE111111111111111111111'),(u'My field one__BIC', u'BLEAHBLA1'),(u'My field three__company name', u'Company XYZ'),(u'My field two__BIC', u'ASDF333')]>>> groups = groupby(x ,lambda (field, val): field.split('__')[0])>>> grouped_fields = {key: list(val) for key, val in groups}>>> grouped_fields{u'My field one': [(u'My field one__BIC', u'BLEAHBLA1')],u'My field three': [(u'My field three__company name', u'Company XYZ')],u'My field two': [(u'My field two__BIC', u'ASDF333')]}>>> x[0]
(u'My field one__AccountName', u'Lorem ipsum bla bla bla')>>> x[1]
(u'My field one__AccountNumber', u'1111111222255555')>>> x[0][0].split('__')[0] == x[1][0].split('__')[0]
True

However it seems to work with another instance of initial list:

>>> y = [(u'x a b__2', 3), (u'x a b__', 1), (u'x a b__1', 2), (u'y a__1', 1), (u'y a__2', 2)]>>> y
[(u'x__2', 3), (u'x__', 1), (u'x__1', 2), (u'y__1', 1), (u'y__2', 2)]>>> groupes_y = groupby(y, lambda (k,v): k.split('__')[0])>>> grouped_y = {key:list(val) for key, val in groupes_y}>>> grouped_y{u'x': [(u'x__2', 3), (u'x__', 1), (u'x__1', 2)],u'y': [(u'y__1', 1), (u'y__2', 2)]}

No idea what I am doing wrong.

Question 2

As the docs say, you are supposed to apply groupby to a list which is already sorted using the same key as groupby itself:

key = lambda fv: fv[0].split('__')[0]
groups = groupby(sorted(x, key=key), key=key)

Then grouped_fields is:

{u'My field one': [(u'My field one__AccountName', u'Lorem ipsum bla bla bla'),(u'My field one__AccountNumber', u'1111111222255555'),(u'My field one__BIC', u'BLEAHBLA1')],u'My field three': [(u'My field three__Beneficiary International Bank Account Number IBAN',u'IE111111111111111111111'),(u'My field three__company name', u'Company XYZ')],u'My field two': [(u'My field two__Num', u'Num: 612312345'),(u'My field two', u'asdasdafassda'),(u'My field two__BIC', u'ASDF333')]}

In your second example, it happens that y is already sorted:

>>> y == sorted(y, key=key)
True

Python groupby doesnt work as expected [duplicate]

Related Q&A

Dask: create strictly increasing index

Installing hunspell package

Flask-Restful taking over exception handling from Flask during non debug mode

Fetching data with snowflake connector throws EmptyPyArrowIterator error

What does epochs mean in Doc2Vec and train when I have to manually run the iteration?

TensorFlow 2.0 How to get trainable variables from tf.keras.layers layers, like Conv2D or Dense

Convert Excel row,column indices to alphanumeric cell reference in python/openpyxl

Flask-admin - how to change formatting of columns - get URLs to display

Stream audio from pyaudio with Flask to HTML5

Adding into Path var while silent installation of Python - possible bug?