Question 1

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format

csv file

{"_id": {"$oid": "2e3230"},"add": {"address1": {"address": "kvartira 14","zipcode": "10005",},"name": "Evgiya Kovava","address2": {"country": "US","country_name": "NY",}}
}
{"_id": {"$oid": "2d118c8bo"},"add": {"address1": {"address": "kvartira 14","zipcode": "52805",},"name": "Eiya tceva","address2": {"country": "US","country_name": "TX",}}
}

Question 2

import pandas as pdnull = 'null'data = {"_id": {"$oid": "2e3230s314i5dc07e118c8bo"},"add": {"address": {"address_type": "Door","address": "kvartira 14","city": "new york","region": null,"zipcode": "10005",},"name": "Evgeniya Kovantceva","type": "Private person","code": null,"additional_phone_nums": null,"email": null,"notifications": [],"address": {"address": "kvartira 14","city": "new york","region": null,"zipcode": "10005","country": "US","country_name": "NY",}}
}df = pd.json_normalize(data)
df.to_csv('yourpath.csv')

Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?

EDIT

Ok after your information it looks like json.JSONDecoder() is what you need.

Originally posted by @pschill on this link: how to analyze json objects that are NOT separated by comma (preferably in Python)

I tried his code on your data:

import json 
import pandas as pddata = """{"_id": {"$oid": "2e3230"},"add": {"address1": {"address": "kvartira 14","zipcode": "10005"},"name": "Evgiya Kovava","address2": {"country": "US","country_name": "NY"}}
}
{"_id": {"$oid": "2d118c8bo"},"add": {"address1": {"address": "kvartira 14","zipcode": "52805"},"name": "Eiya tceva","address2": {"country": "US","country_name": "TX"}}
}"""

Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).

You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.

So after that I tried this:

content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:value, new_start = decoder.raw_decode(content)content = content[new_start:].strip()# You can handle the value directly in this loop:# print("Parsed:", value)# Or you can store it in a container and use it later:parsed_values.append(value)

which gave me an error but the list seems to get populated with all the values:

parsed_values
[{'_id': {'$oid': '2e3230'},'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},'name': 'Evgiya Kovava','address2': {'country': 'US', 'country_name': 'NY'}}},{'_id': {'$oid': '2d118c8bo'},'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},'name': 'Eiya tceva','address2': {'country': 'US', 'country_name': 'TX'}}}]

next I did:

df = pd.json_normalize(parsed_values)

which worked fine. You can always save that to a csv with:

df.to_csv('yourpath.csv')

Tell me if that helped.

Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P

complex json file to csv in python

Related Q&A

python pygame - how to create a drag and drop with multiple images?

Efficiently append an element to each of the lists in a large numpy array

How to traverse a high-order range in Python? [duplicate]

How to send eth_requestAccounts to Metamask in PyScript?

Extract strings that start with ${ and end with }

Weibull distribution and the data in the same figure (with numpy and scipy) [closed]

python: use agg with more than one customized function

sending multiple images using socket python get sent as one to client

What are the different methods to retrieve elements in a pandas Series?

Speaker recognition - Bad Request error on microsoft oxford