complex json file to csv in python

2024/9/21 8:13:01

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format

csv file

{"_id": {"$oid": "2e3230"},"add": {"address1": {"address": "kvartira 14","zipcode": "10005",},"name": "Evgiya Kovava","address2": {"country": "US","country_name": "NY",}}
}
{"_id": {"$oid": "2d118c8bo"},"add": {"address1": {"address": "kvartira 14","zipcode": "52805",},"name": "Eiya tceva","address2": {"country": "US","country_name": "TX",}}
}
Answer
import pandas as pdnull = 'null'data = {"_id": {"$oid": "2e3230s314i5dc07e118c8bo"},"add": {"address": {"address_type": "Door","address": "kvartira 14","city": "new york","region": null,"zipcode": "10005",},"name": "Evgeniya Kovantceva","type": "Private person","code": null,"additional_phone_nums": null,"email": null,"notifications": [],"address": {"address": "kvartira 14","city": "new york","region": null,"zipcode": "10005","country": "US","country_name": "NY",}}
}df = pd.json_normalize(data)
df.to_csv('yourpath.csv')

Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?

EDIT

Ok after your information it looks like json.JSONDecoder() is what you need.

Originally posted by @pschill on this link: how to analyze json objects that are NOT separated by comma (preferably in Python)

I tried his code on your data:

import json 
import pandas as pddata = """{"_id": {"$oid": "2e3230"},"add": {"address1": {"address": "kvartira 14","zipcode": "10005"},"name": "Evgiya Kovava","address2": {"country": "US","country_name": "NY"}}
}
{"_id": {"$oid": "2d118c8bo"},"add": {"address1": {"address": "kvartira 14","zipcode": "52805"},"name": "Eiya tceva","address2": {"country": "US","country_name": "TX"}}
}"""

Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).

You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.

So after that I tried this:

content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:value, new_start = decoder.raw_decode(content)content = content[new_start:].strip()# You can handle the value directly in this loop:# print("Parsed:", value)# Or you can store it in a container and use it later:parsed_values.append(value)

which gave me an error but the list seems to get populated with all the values:

parsed_values
[{'_id': {'$oid': '2e3230'},'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},'name': 'Evgiya Kovava','address2': {'country': 'US', 'country_name': 'NY'}}},{'_id': {'$oid': '2d118c8bo'},'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},'name': 'Eiya tceva','address2': {'country': 'US', 'country_name': 'TX'}}}]

next I did:

df = pd.json_normalize(parsed_values)

which worked fine. You can always save that to a csv with:

df.to_csv('yourpath.csv')

Tell me if that helped.

Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P

https://en.xdnf.cn/q/119230.html

Related Q&A

python pygame - how to create a drag and drop with multiple images?

So Ive been trying to create a jigsaw puzzle using pygame in python.The only problem is that Im having trouble creating the board with multiple images that i can drag along the screen (no need to conne…

Efficiently append an element to each of the lists in a large numpy array

I have a really large numpy of array of lists, and I want to append an element to each of the arrays. I want to avoid using a loop for the sake of performance. The following syntax is not working. a=np…

How to traverse a high-order range in Python? [duplicate]

This question already has answers here:Equivalent Nested Loop Structure with Itertools(2 answers)Closed 4 years ago.In python, we can use range(x) to traverse from 0 to x-1. But what if I want to trave…

How to send eth_requestAccounts to Metamask in PyScript?

I am trying to get address from installed MetaMask on the browser. We used to do this in JS as follow:const T1 = async () => {let Address = await window.ethereum.request({method: "eth_requestAc…

Extract strings that start with ${ and end with }

Im trying to extract the strings from a file that start with ${ and ends with } using Python. I am using the code below to do so, but I dont get the expected result.My input file looks like this:Click …

Weibull distribution and the data in the same figure (with numpy and scipy) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

python: use agg with more than one customized function

I have a data frame like this.mydf = pd.DataFrame({a:[1,1,3,3],b:[np.nan,2,3,6],c:[1,3,3,9]})a b c 0 1 NaN 1 1 1 2.0 3 2 3 3.0 3 3 3 6.0 9I would like to have a resulting dataframe like…

sending multiple images using socket python get sent as one to client

I am capturing screenshots from the server, then sending it to the client, but the images get all sent as one big file to the client that keeps expanding in size. This only happens when i send from one…

What are the different methods to retrieve elements in a pandas Series?

There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.Whats the difference between them ? How do they handle missing labels/out of range…

Speaker recognition - Bad Request error on microsoft oxford

I am using the python wrapper that has been given in the SDK section. Ive been trying to enroll a voice file for a created profile using the python API.I was able to create a profile and list all profi…