pandas list of dictionary to separate columns

2024/9/20 0:38:58

I have a data set like below:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]

How can I extract the key value pairs, and turn them into a dataframe expanded all the way out?

Expected output:

name    status   number    job    money    wife    group   kids 
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one

The message contains multiple different key types.

Any help would be greatly appreciated.

Answer

It is not easy.

Need convert values to list of dict by replace (\s+ is one or more whitespaces) and then use ast.

Then is possible use DataFrame constructor with concat, pop drop column from df:

import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)kids  money group   job  money  wife
0   NaN   none   NaN  none    NaN  none
1   NaN    NaN  band   NaN  10000   yes
2   one    NaN  jail  none   none  nonedf = pd.concat([df, df1], axis=1)
print (df)name    status  number  kids  money group   job  money  wife
0   matt    active   12345   NaN   none   NaN  none    NaN  none
1  james    active   23456   NaN    NaN  band   NaN  10000   yes
2   adam  inactive   34567   one    NaN  jail  none   none  none

EDIT:

Another solution with yaml:

import yamldf.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)group   job kids  money  wife
0   NaN  None  NaN   none  none
1  band   NaN  NaN  10000  True
2  jail  none  one   none  Nonedf = pd.concat([df, df1], axis=1)
print (df)name    status  number group   job kids  money  wife
0   matt    active   12345   NaN  None  NaN   none  none
1  james    active   23456  band   NaN  NaN  10000  True
2   adam  inactive   34567  jail  none  one   none  None
https://en.xdnf.cn/q/72398.html

Related Q&A

Where is console input history stored on Python for Windows?

Good afternoon,The QuestionIs there a particular spot that the entries are stored, or is it just a local set of stored variables, for the windows version of Python?The ContextI am curious about where …

Matplotlib Animation: how to dynamically extend x limits?

I have a simple animation plot like so: import numpy as np from matplotlib import pyplot as plt from matplotlib import animation# First set up the figure, the axis, and the plot element we want to anim…

How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.Anyone know how I can overcome this issue only using standard…

Installing lxml in virtualenv for windows

Ive recently started using virtualenv, and would like to install lxml in this isolated environment.Normally I would use the windows binary installer, but I want to use lxml in this virtualenv (not glob…

Saving a model in Django gives me Warning: Field id doesnt have a default value

I have a very basic model in Django:class Case(models.Model):name = models.CharField(max_length=255)created_at = models.DateTimeField(default=datetime.now)updated_at = models.DateTimeField(default=date…

Authorization architecture in microservice cluster

I have a project with microservice architecture (on Docker and Kubernetes), and 2 main apps are written in Python using AIOHTTP and Django (also there are and Ingress proxy, static files server, a coup…

fastest way to load images in python for processing

I want to load more than 10000 images in my 8gb ram in the form of numpy arrays.So far I have tried cv2.imread,keras.preprocessing.image.load_image,pil,imageio,scipy.I want to do it the fastest way pos…

How to access server response when Python requests library encounters the retry limit

I am using the Python requests library to implement retry logic. Here is a simple script I made to reproduce the problem that I am having. In the case where we run out of retries, I would like to be ab…

Matplotlib patch with holes

The following code works. The problem is I dont know exactly why it works. The code draws a circle patch (using PathPatch) with a triangle cutout from the centre. My guess is that the inner triangle is…

Convert sha256 digest to UUID in python

Given a sha256 hash of a str in python: import hashlibhash = hashlib.sha256(foobar.encode(utf-8))How can the hash be converted to a UUID? Note: there will obviously be a many-to-one mapping of hexdige…