How to apply json_normalize on entire pandas column

2024/9/22 5:30:42

I have a dataframe with LISTS(with dicts) as column values . My intention is to normalize entire column(all rows). I found way to normalize a single row . However, I'm unable to apply the same function for the entire dataframe or column.

data = {'COLUMN': [ [{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}], [{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}], [{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}] ] }source_df = pd.DataFrame(data)

source_df looks like below :

enter image description here

As per https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html I managed to get output for one row.

Code to apply for one row:

Target_df = json_normalize(source_df['COLUMN'][0], 'volumes', ['name','id','state','nodes'], record_prefix='volume_')

Output for above code :

enter image description here

I would like to know how we can achieve desired output for the entire column

Expected output:

enter image description here

EDIT: @lostCode , below is the input with nan and empty list

enter image description here

Answer

You can do:

Target_df=pd.concat([json_normalize(source_df['COLUMN'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in source_df.index]).reset_index(drop=True)

Output:

    volume_state    volume_id   volume_name  name   id     state     nodes
0   available       330172      q_-4144d4e   WAG 01 105F    available   3
1   available       275192      p_3089d821ae WAG 01 105F    available   3
2   unavailable     830172      w_-4144d4e   FEC 01 382E    available   4
3   unavailable     223192      g_3089d821ae FEC 01 382E    available   4
4   unavailable     930172      e_-4144d4e   ASD 01 303F    available   6
5   unavailable     245192      h_3089d821ae ASD 01 303F    available   6

concat, is used to concatenate a dataframe list, in this case the list that is generated using json_normalize is concatenated on all rows of source_df

You can use to check type of source_df:

Target_df=pd.concat([json_normalize(source_df['COLUMN'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in source_df.index if isinstance(source_df['COLUMN'][key],list)]).reset_index(drop=True)
https://en.xdnf.cn/q/71983.html

Related Q&A

Configure Vs code version 2.0.0 Build Task for python

I need help in configuring my Vs code to run scripts in python using Cntrl Shift B, I was working fine until Vs code upgraded to version 2.0.0 now it wants me to configure the Build. And I am clueless…

Generate N-Grams from strings with pandas

I have a DataFrame df like this: Pattern String 101 hi, how are you? 104 what are you doing? 108 Python is good to learn.I want to crea…

Merge dataframes on multiple columns with fuzzy match in Python

I have two example dataframes as follows:df1 = pd.DataFrame({Name: {0: John, 1: Bob, 2: Shiela}, Degree: {0: Masters, 1: Graduate, 2: Graduate}, Age: {0: 27, 1: 23, 2: 21}}) df2 = pd.DataFrame({Name: {…

Prevent Celery Beat from running the same task

I have a scheduled celery running tasks every 30 seconds. I have one that runs as task daily, and another one that runs weekly on a user specified time and day of the week. It checks for the "star…

Tastypie with application/x-www-form-urlencoded

Im having a bit of difficulty figuring out what my next steps should be. I am using tastypie to create an API for my web application. From another application, specifically ifbyphone.com, I am receivin…

Check for areas that are too thin in an image

I am trying to validate black and white images (more of a clipart images - not photos) for an engraving machine. One of the major things I need to take into consideration is the size of areas (or width…

Sort Python Dictionary by Absolute Value of Values

Trying to build off of the advice on sorting a Python dictionary here, how would I go about printing a Python dictionary in sorted order based on the absolute value of the values?I have tried:sorted(m…

impyla hangs when connecting to HiveServer2

Im writing some ETL flows in Python that, for part of the process, use Hive. Clouderas impyla client, according to the documentation, works with both Impala and Hive.In my experience, the client worked…

django prevent delete of model instance

I have a models.Model subclass which represents a View on my mysql database (ie managed=False).However, when running my unit tests, I get:DatabaseError: (1288, The target table my_view_table of the DEL…

suppress/redirect stderr when calling python webrowser

I have a python program that opens several urls in seperate tabs in a new browser window, however when I run the program from the command line and open the browser using webbrowser.open_new(url)The std…