Question 1

I have a dataframe with LISTS(with dicts) as column values . My intention is to normalize entire column(all rows). I found way to normalize a single row . However, I'm unable to apply the same function for the entire dataframe or column.

data = {'COLUMN': [ [{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}], [{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}], [{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}] ] }source_df = pd.DataFrame(data)

source_df looks like below :

enter image description here

As per https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html I managed to get output for one row.

Code to apply for one row:

Target_df = json_normalize(source_df['COLUMN'][0], 'volumes', ['name','id','state','nodes'], record_prefix='volume_')

Output for above code :

enter image description here

I would like to know how we can achieve desired output for the entire column

Expected output:

enter image description here

EDIT: @lostCode , below is the input with nan and empty list

enter image description here

Question 2

You can do:

Target_df=pd.concat([json_normalize(source_df['COLUMN'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in source_df.index]).reset_index(drop=True)

Output:

    volume_state    volume_id   volume_name  name   id     state     nodes
0   available       330172      q_-4144d4e   WAG 01 105F    available   3
1   available       275192      p_3089d821ae WAG 01 105F    available   3
2   unavailable     830172      w_-4144d4e   FEC 01 382E    available   4
3   unavailable     223192      g_3089d821ae FEC 01 382E    available   4
4   unavailable     930172      e_-4144d4e   ASD 01 303F    available   6
5   unavailable     245192      h_3089d821ae ASD 01 303F    available   6

concat, is used to concatenate a dataframe list, in this case the list that is generated using json_normalize is concatenated on all rows of source_df

You can use to check type of source_df:

Target_df=pd.concat([json_normalize(source_df['COLUMN'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in source_df.index if isinstance(source_df['COLUMN'][key],list)]).reset_index(drop=True)

How to apply json_normalize on entire pandas column

Related Q&A

Configure Vs code version 2.0.0 Build Task for python

Generate N-Grams from strings with pandas

Merge dataframes on multiple columns with fuzzy match in Python

Prevent Celery Beat from running the same task

Tastypie with application/x-www-form-urlencoded

Check for areas that are too thin in an image

Sort Python Dictionary by Absolute Value of Values

impyla hangs when connecting to HiveServer2

django prevent delete of model instance

suppress/redirect stderr when calling python webrowser