Pandas - Create dynamic column(s) from a single columns values

2024/11/10 13:00:09

I have JSON data which I am planning after converting it to desired dataframe, will concat with another dataframe.

Participant


**row 1** [{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}]**row 2** [{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}]**row 3** [{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}]

However, I want it like this where dynamically the column will be generated and "N" of Participant#N Role or Participant#N Name should be the maximum number of participant present in the rows of the dataframe overall :

Desired Dataframe

So far I have tried: Try 01:

participants = pd.concat([pd.DataFrame(pd.json_normalize(x)) for x in responses['participants']])
print(participants.transpose())

I could not find any relevant post to move forward to the desired dataframe.

Try 02:

responses['Role of Participants'] = [x[0]['roles'] for x in participants['roles']]
responses['Participant Name'] = [x[0]['life'] for x in participants['participants']]

But it just returned only the first type object in role and first name object of life for each data where there can be multiple.

Answer

You could run one apply() which will use for-loop to convert list to Series with headers - it can use enumerate to put correct number in headers.

Because some rows have less participants so it puts NaN which you can later fill with empty strings.

And next you can use join() to add all as new columns. Because headers are create in apply() so you don't have to create them in join()

import pandas as pddata = {'participants': 
[[{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],[{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],[{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}df = pd.DataFrame(data)def get_names(cell):all_names = pd.Series(dtype=object)for number, item in enumerate(cell, 1):name = item['life']['name']all_names[f'Participant #{number} Name'] = namereturn all_namesdef get_roles(cell):all_roles = pd.Series(dtype=object)for number, item in enumerate(cell, 1):roles = [role['type'] for role in item['roles']]all_roles[f'Participant #{number} Role'] = ",".join(roles)return all_rolesroles = df['participants'].apply(get_roles)
roles = roles.fillna('')  # put empty string in place of NaNnames = df['participants'].apply(get_names)
names = names.fillna('')  # put empty string in place of NaNdf = df.join(roles)
df = df.join(names)df = df.drop(columns=['participants'])  # remove old columnpd.options.display.max_colwidth = 100
print(df.to_string())

Result:

                 Participant #1 Role        Participant #2 Role Participant #3 Role     Participant #1 Name    Participant #2 Name Participant #3 Name
0  director,founder,owner,real_owner                                                              Lichun Du                                           
1                              board  director,board,real_owner    board,real_owner           Erik Mølgaard  Mikael Bodholdt Linde  Dorte Bøcker Linde
2                director,real_owner                      owner                      Kristian Løth Hougaard  WORLD JET HOLDING ApS  

I used two function to get first only columns with roles and next columns only with names - but if you would need role1, name1, role2, name2, role3, name3 then it could be done with one function.

import pandas as pddata = {'participants': 
[[{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],[{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],[{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}df = pd.DataFrame(data)def get_columns(cell):results = pd.Series(dtype=object)for number, item in enumerate(cell, 1):name = item['life']['name']results[f'Participant #{number} Name'] = nameroles = [role['type'] for role in item['roles']]results[f'Participant #{number} Role'] = ",".join(roles)return resultscolumns = df['participants'].apply(get_columns)
names = columns.fillna('')  # put empty string in place of NaNdf = df.join(columns)
#print(df.columns)df = df.drop(columns=['participants'])pd.options.display.max_colwidth = 100
print(df.to_string())

Result:

      Participant #1 Name                Participant #1 Role    Participant #2 Name        Participant #2 Role Participant #3 Name Participant #3 Role
0               Lichun Du  director,founder,owner,real_owner                    NaN                        NaN                 NaN                 NaN
1           Erik Mølgaard                              board  Mikael Bodholdt Linde  director,board,real_owner  Dorte Bøcker Linde    board,real_owner
2  Kristian Løth Hougaard                director,real_owner  WORLD JET HOLDING ApS                      owner                 NaN                 NaN
https://en.xdnf.cn/q/119778.html

Related Q&A

How to automatically remove certain preprocessors directives and comments from a C header-file?

Whats a good way to remove all text from a file which lies between /* */ and #if 0 and corresponding #endif? I want to strip these parts from C headers. This is the code I have so far:For line in file…

Get all pairs from elements in sublists

I have a list of sublists. I need all possible pairs between the elements in the sublists. For example, for a list like this: a=[[1,2,3],[4,5],[6]]The result should be: result=[[1,4], [1,5], [1,6], [2,…

Extracting variables from Javascript inside HTML

I need all the lines which contains the text .mp4. The Html file has no tag!My code:import urllib.request import demjson url = (https://myurl) content = urllib.request.urlopen(url).read()<script typ…

Pygame, self is not defined [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Python 3- assigns grades [duplicate]

This question already has answers here:Python 3- Assign grade(2 answers)Closed 8 years ago.• Define a function to prompt the user to enter valid scores until they enter a sentinel value -999. Have …

how to read video data from memory use pyqt5

i have an encrypted video file, i want to decrypt this file into memory and then use this data play video. but qt mediaplayer class is to pass a file name in, i need to have any good way?this is my co…

Pandas apply custom function to DF

I would like to create a brand new data frame by replacing values of a DF using a custom function. I keep getting the following error "ValueError: The truth value of a Series is ambiguous. Use a.e…

Economy Bot Daily Streak

I have a Discord.py economy bot that includes a daily command It gives everyone each day $50, but there is also a streak system. The first time they claim their daily, the bot gives them $50, day 2 is …

Normalise JSON with Python

Prompt me, please, how to normalize this JSON file using Python? This question is related to the previous The current JSON contains: {"total_stats": [{"domain": "domain.com&qu…

How to change decimal separator?

I have an Excel spreadsheet (an extract from SAP). I turn this into a DataFrame, do calculations and save it to an SQLite database. The Excel spreadsheet has comma as decimal separator. The SQLite data…