Issue parsing multiline JSON file using Python

2024/10/6 12:25:10

I am trying to parse a JSON multiline file using json library in Python 2.7. A simplified sample file is given below:

{
"observations": {"notice": [{"copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml","copyright_url": "http://www.bom.gov.au/other/copyright.shtml","disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml","feedback_url": "http://www.bom.gov.au/other/feedback"}]
}
}

My code is as follows:

import jsonwith open('test.json', 'r') as jsonFile:for jf in jsonFile:jf = jf.replace('\n', '')jf = jf.strip()weatherData = json.loads(jf)print weatherData

Nevertheless, I get an error as shown below:

Traceback (most recent call last):
File "test.py", line 8, in <module>
weatherData = json.loads(jf)
File "/home/usr/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 1 (char 0)

Just to do some testing, I modified the code such that after removing newlines and striping away the leading and trailing white spaces, I write the contents to another file (with the json extension). Surprisingly, when I read back the latter file, I do not get any error and the parsing is successful. The modified code is as follows:

import jsonfilewrite = open('out.json', 'w+')with open('test.json', 'r') as jsonFile:for jf in jsonFile:jf = jf.replace('\n', '')jf = jf.strip()filewrite.write(jf)filewrite.close()with open('out.json', 'r') as newJsonFile:for line in newJsonFile:weatherData = json.loads(line)print weatherData

The output is as follows:

{u'observations': {u'notice': [{u'copyright_url': u'http://www.bom.gov.au/other/copyright.shtml', u'disclaimer_url': u'http://www.bom.gov.au/other/disclaimer.shtml', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml', u'feedback_url': u'http://www.bom.gov.au/other/feedback'}]}}

Any idea what might be going on when new lines and white spaces are stripped before using json library?

Answer

You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load and loads methods. load takes a file object (as shown below) for a file that contains json data, while loads takes a string that contains json data.

Option 1: - Preferred

import json
with open('test.json', 'r') as jf:weatherData = json.load(jf)print weatherData

Option 2:

import json
with open('test.json', 'r') as jf:weatherData = json.loads(jf.read())print weatherData

If you are looking for higher performance json parsing check out ujson

https://en.xdnf.cn/q/70370.html

Related Q&A

timezone aware vs. timezone naive in python

I am working with datetime objects in python. I have a function that takes a time and finds the different between that time and now. def function(past_time):now = datetime.now()diff = now - past_timeWh…

How to return a value from Python script as a Bash variable?

This is a summary of my code:# import whateverdef createFolder():#someCodevar1=Gdrive.createFolder(name)return var1 def main():#someCodevar2=createFolder()return var2if __name__ == "__main__"…

How to align text to the right in ttk Treeview widget?

I am using a ttk.Treeview widget to display a list of Arabic books. Arabic is a right-to-left language, so the text should be aligned to the right. The justify option that is available for Label and o…

ImportError: cannot import name RemovedInDjango19Warning

Im on Django 1.8.7 and Ive just installed Django-Allauth by cloning the repo and running pip install in the apps directory in my webapp on the terminal. Now when I run manage.py migrate, I get this err…

How does Yahoo Finance calculate Adjusted Close stock prices?

Heres how Yahoo Finance apparently calculates Adjusted Close stock prices:https://help.yahoo.com/kb/adjusted-close-sln28256.htmlFrom this, I understand that a constant factor is applied to the unadjust…

Celery design help: how to prevent concurrently executing tasks

Im fairly new to Celery/AMQP and am trying to come up with a task/queue/worker design to meet the following requirements.I have multiple types of "per-user" tasks: e.g., TaskA, TaskB, TaskC. …

Google App Engine - Using Search API Python with list fields

Im using ndb.Model. The Search API has the following field classes:TextField : plain textHtmlField : HTML formatted textAtomField : a string which is treated as a single tokenNumberField : a numeric v…

Handling PyMySql exceptions - Best Practices

My question regards exception best practices. Ill present my question on a specific case with PyMySQL but it regards errors handling in general. I am using PyMySQL and out of the many possible exceptio…

Beautifulsoup find element by text using `find_all` no matter if there are elements in it

For examplebs = BeautifulSoup("<html><a>sometext</a></html>") print bs.find_all("a",text=re.compile(r"some"))returns [<a>sometext</a>] …

Python: how to get values from a dictionary from pandas series

I am very new to python and trying to get value from dictionary where keys are defined in a dataframe column (pandas). I searched quite a bit and the closest thing is a question in the link below, but…