I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format
csv file
{"_id": {"$oid": "2e3230"},"add": {"address1": {"address": "kvartira 14","zipcode": "10005",},"name": "Evgiya Kovava","address2": {"country": "US","country_name": "NY",}}
}
{"_id": {"$oid": "2d118c8bo"},"add": {"address1": {"address": "kvartira 14","zipcode": "52805",},"name": "Eiya tceva","address2": {"country": "US","country_name": "TX",}}
}
import pandas as pdnull = 'null'data = {"_id": {"$oid": "2e3230s314i5dc07e118c8bo"},"add": {"address": {"address_type": "Door","address": "kvartira 14","city": "new york","region": null,"zipcode": "10005",},"name": "Evgeniya Kovantceva","type": "Private person","code": null,"additional_phone_nums": null,"email": null,"notifications": [],"address": {"address": "kvartira 14","city": "new york","region": null,"zipcode": "10005","country": "US","country_name": "NY",}}
}df = pd.json_normalize(data)
df.to_csv('yourpath.csv')
Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?
EDIT
Ok after your information it looks like json.JSONDecoder() is what you need.
Originally posted by @pschill on this link:
how to analyze json objects that are NOT separated by comma (preferably in Python)
I tried his code on your data:
import json
import pandas as pddata = """{"_id": {"$oid": "2e3230"},"add": {"address1": {"address": "kvartira 14","zipcode": "10005"},"name": "Evgiya Kovava","address2": {"country": "US","country_name": "NY"}}
}
{"_id": {"$oid": "2d118c8bo"},"add": {"address1": {"address": "kvartira 14","zipcode": "52805"},"name": "Eiya tceva","address2": {"country": "US","country_name": "TX"}}
}"""
Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).
You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.
So after that I tried this:
content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:value, new_start = decoder.raw_decode(content)content = content[new_start:].strip()# You can handle the value directly in this loop:# print("Parsed:", value)# Or you can store it in a container and use it later:parsed_values.append(value)
which gave me an error but the list seems to get populated with all the values:
parsed_values
[{'_id': {'$oid': '2e3230'},'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},'name': 'Evgiya Kovava','address2': {'country': 'US', 'country_name': 'NY'}}},{'_id': {'$oid': '2d118c8bo'},'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},'name': 'Eiya tceva','address2': {'country': 'US', 'country_name': 'TX'}}}]
next I did:
df = pd.json_normalize(parsed_values)
which worked fine.
You can always save that to a csv with:
df.to_csv('yourpath.csv')
Tell me if that helped.
Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P