Issue parsing multiline JSON file using Python

2024/10/6 12:25:10

I am trying to parse a JSON multiline file using json library in Python 2.7. A simplified sample file is given below:

"observations": {"notice": [{"copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see:","copyright_url": "","disclaimer_url": "","feedback_url": ""}]

My code is as follows:

import jsonwith open('test.json', 'r') as jsonFile:for jf in jsonFile:jf = jf.replace('\n', '')jf = jf.strip()weatherData = json.loads(jf)print weatherData

Nevertheless, I get an error as shown below:

Traceback (most recent call last):
File "", line 8, in <module>
weatherData = json.loads(jf)
File "/home/usr/anaconda2/lib/python2.7/json/", line 339, in loads
return _default_decoder.decode(s)
File "/home/usr/anaconda2/lib/python2.7/json/", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/usr/anaconda2/lib/python2.7/json/", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 1 (char 0)

Just to do some testing, I modified the code such that after removing newlines and striping away the leading and trailing white spaces, I write the contents to another file (with the json extension). Surprisingly, when I read back the latter file, I do not get any error and the parsing is successful. The modified code is as follows:

import jsonfilewrite = open('out.json', 'w+')with open('test.json', 'r') as jsonFile:for jf in jsonFile:jf = jf.replace('\n', '')jf = jf.strip()filewrite.write(jf)filewrite.close()with open('out.json', 'r') as newJsonFile:for line in newJsonFile:weatherData = json.loads(line)print weatherData

The output is as follows:

{u'observations': {u'notice': [{u'copyright_url': u'', u'disclaimer_url': u'', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see:', u'feedback_url': u''}]}}

Any idea what might be going on when new lines and white spaces are stripped before using json library?


You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load and loads methods. load takes a file object (as shown below) for a file that contains json data, while loads takes a string that contains json data.

Option 1: - Preferred

import json
with open('test.json', 'r') as jf:weatherData = json.load(jf)print weatherData

Option 2:

import json
with open('test.json', 'r') as jf:weatherData = json.loads( weatherData

If you are looking for higher performance json parsing check out ujson

