Convert pandas DataFrame to dict and preserve duplicated indexes

2024/11/16 14:50:39
vagrant@ubuntu-xenial:~/lb/f5/v12$ python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> data = [{'name': 'bob', 'age': 20}, {'name': 'jim', 'age': 25}, {'name': 'bob', 'age': 30}]
>>> df = pd.DataFrame(data)
>>> df.set_index(keys='name', drop=False, inplace=True)
>>> dfage name
name
bob    20  bob
jim    25  jim
bob    30  bob
>>> df.to_dict(orient='index')
{'bob': {'age': 30, 'name': 'bob'}, 'jim': {'age': 25, 'name': 'jim'}}
>>>

If we convert the dataframe to a dictionary, the duplicate entry (bob, age 20) is removed. Is there any possible way to produce a dictionary whose values are a list of dictionaries? Something that looks like this?

{'bob': [{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}], 'jim': [{'age': 25, 'name': 'jim'}]}
Answer

It should be possible to do this if you group on the index.

groupby Comprehension

{k: g.to_dict(orient='records') for k, g in df.groupby(level=0)}
# {'bob': [{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}],
#  'jim': [{'age': 25, 'name': 'jim'}]}

Details
groupby allows us to partition the data based on unique keys:

for k, g in df.groupby(level=0):print(g, end='\n\n')age name
name          
bob    20  bob
bob    30  bobage name
name          
jim    25  jim

For each group, convert this into a dictionary using the "records" orient:

for k, g in df.groupby(level=0):print(g.to_dict('r'))[{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}]
[{'age': 25, 'name': 'jim'}]

And have it accessible by the grouper key.


GroupBy.apply + to_dict

df.groupby(level=0).apply(lambda x: x.to_dict('r')).to_dict()
# {'bob': [{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}],
#  'jim': [{'age': 25, 'name': 'jim'}]}

apply does the same thing that the dictionary comprehension does—it iterates over each group. The only difference is apply will require one final to_dict call at the end to dictify the data.

https://en.xdnf.cn/q/71661.html

Related Q&A

Drawing rectangle on top of data using patches

I am trying to draw a rectangle on top of a data plot in matplotlib. To do this, I have this codeimport matplotlib.patches as patches import matplotlib.pyplot as pl...fig = pl.figure() ax=fig.add_axes(…

Setting row edge color of matplotlib table

Ive a pandas DataFrame plotted as a table using matplotlib (from this answer).Now I want to set the bottom edge color of a given row and Ive this code:import pandas as pd import numpy as np import matp…

TypeError: string indices must be integers (Python) [duplicate]

This question already has answers here:Why am I seeing "TypeError: string indices must be integers"?(10 answers)Closed 5 years ago.I am trying to retrieve the id value : ad284hdnn.I am getti…

how to split numpy array and perform certain actions on split arrays [Python]

Only part of this question has been asked before ([1][2]) , which explained how to split numpy arrays. I am quite new in Python. I have an array containing 262144 items and want to split it in small…

NLTK was unable to find the java file! for Stanford POS Tagger

I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code:stanford_dir = C:/Users/.../stanford-postagger-2017-06-09…

Append a list in Google Sheet from Python

I have a list in Python which I simply want to write (append) in the first column row-by-row in a Google Sheet. Im done with all the initial authentication part, and heres the code:credentials = Google…

Compute linear regression standardized coefficient (beta) with Python

I would like to compute the beta or standardized coefficient of a linear regression model using standard tools in Python (numpy, pandas, scipy.stats, etc.).A friend of mine told me that this is done in…

Individually labeled bars for bar graph in Plotly

I was trying to create annotations for grouped bar charts - where each bar has a specific data label that shows the value of that bar and is located above the centre of the bar.I tried a simple modific…

Is there a way to subclass a generator in Python 3?

Aside from the obvious, I thought Id try this, just in case:def somegen(input=None):...yield...gentype = type(somegen()) class subgen(gentype):def best_function_ever():...Alas, Pythons response was qui…

represent binary search trees in python

how do i represent binary search trees in python?