Pandas Dataframe to dict grouping by column

2024/9/8 8:56:31

I have a dataframe like this:

Subject_id    Subject    Score    
Subject_1        Math        5                 
Subject_1    Language        4                 
Subject_1       Music        8
Subject_2        Math        8                 
Subject_2    Language        3                 
Subject_2       Music        9

And I want to convert it into a dictionary, grouping by subject_id

{'Subject_1': {'Math': 5,'Language': 4,'Music': 8},
{'Subject_2': {'Math': 8,'Language': 3,'Music': 9}
}

If I would have only one Subject, then I could so:

my_dict['Subject_1'] = dict(zip(df['Subject'],df['Score']))

But since I have several subjects the list of keys repeats, so I cannot use directly a zip.

Dataframes has a .to_dict('index') method but I need to be able to group by a certain column when creating the dictionary.

How could I achieve that?

Thanks.

Answer

Use groupby with custom lambda function and last convert output Series to_dict:

d = (df.groupby('Subject_id').apply(lambda x: dict(zip(x['Subject'],x['Score']))).to_dict())print (d)
{'Subject_2': {'Math': 8, 'Music': 9, 'Language': 3}, 'Subject_1': {'Math': 5, 'Music': 8, 'Language': 4}}

Detail:

print (df.groupby('Subject_id').apply(lambda x: dict(zip(x['Subject'],x['Score']))))Subject_id
Subject_1    {'Math': 5, 'Music': 8, 'Language': 4}
Subject_2    {'Math': 8, 'Music': 9, 'Language': 3}
dtype: object
https://en.xdnf.cn/q/72949.html

Related Q&A

How can I use a Perl module from Python?

There exists a Perl module that provides the perfect functionality for my Python app. Is there any way for me to utilize it? (it is complicated, it would take me a month to port it)I dont want to hav…

HTTPS log in with urllib2

I currently have a little script that downloads a webpage and extracts some data Im interested in. Nothing fancy.Currently Im downloading the page like so:import commands command = wget --output-docume…

Filter values inside Python generator expressions

I have a dictionary dct for which I want each of its values to be summed provided their corresponding keys exist in a specified list lst.The code I am using so far is:sum(dct[k] for k in lst)In the abo…

Python and tfidf algorithm, make it faster?

I am implementing the tf-idf algorithm in a web application using Python, however it runs extremely slow. What I basically do is:1) Create 2 dictionaries:First dictionary: key (document id), value (lis…

How to use Python to find all isbn in a text file?

I have a text file text_isbn with loads of ISBN in it. I want to write a script to parse it and write it to a new text file with each ISBN number in a new line.Thus far I could write the regular expres…

AWS Batch Job Execution Results in Step Function

Im newbie to AWS Step Functions and AWS Batch. Im trying to integrate AWS Batch Job with Step Function. AWS Batch Job executes simple python scripts which output string value (High level simplified req…

Simplify Django test set up with mock objects

Often when Im writing tests for my Django project, I have to write a lot more code to set up database records than I do to actually test the object under test. Currently, I try to use test fixtures to …

How to use tf.data.Dataset.padded_batch with a nested shape?

I am building a dataset with two tensors of shape [batch,width,heigh,3] and [batch,class] for each element. For simplicity lets say class = 5.What shape do you feed to dataset.padded_batch(1000,shape)…

Python, thread and gobject

I am writing a program by a framework using pygtk. The main program doing the following things:Create a watchdog thread to monitor some resource Create a client to receive data from socket call gobjec…

How to type annotate overrided methods in a subclass?

Say I already have a method with type annotations:class Shape:def area(self) -> float:raise NotImplementedErrorWhich I will then subclass multiple times:class Circle:def area(self) -> float:retur…