How to handle a huge stream of JSON dictionaries?

2024/9/25 6:28:15

I have a file that contains a stream of JSON dictionaries like this:

{"menu": "a"}{"c": []}{"d": [3, 2]}{"e": "}"}

It also includes nested dictionaries and it looks like I cannot rely on a newline being a separator. I need a parser that could be used like this:

for d in getobjects(f):handle_dict(d)

The point is that it would be perfect if the iteration only happened at the root level. Is there a Python parser that would handle all JSON's quirks? I am interested in a solution that would work on files that wouldn't fit into RAM.


I think JSONDecoder.raw_decode may be what you're looking for. You may have to do some string formatting to get it in the perfect format depending on newlines and such, but with a bit of work, you'll probably be able to get something working. See this example.

import json
jstring = '{"menu": "a"}{"c": []}{"d": [3, 2]}{"e": "}"}'
substr = jstring
decoder = json.JSONDecoder()while len(substr) > 0:data,index = decoder.raw_decode(substr)print datasubstr = substr[index:]

Gives output:

{u'menu': u'a'}
{u'c': []}
{u'd': [3, 2]}
{u'e': u'}'}

Related Q&A

datatype for handling big numbers in pyspark

I am using spark with python.After uploading a csv file,I needed to parse a column in a csv file which has numbers that are 22 digits long. For parsing that column I used LongType() . I used map() func…

Multi processing code repeatedly runs

So I wish to create a process using the python multiprocessing module, I want it be part of a larger script. (I also want a lot of other things from it but right now I will settle for this)I copied the…

Why use os.setsid() in Python?

I know os.setsid() is to change the process(forked) group id to itself, but why we need it?I can see some answer from Google is: To keep the child process running while the parent process exit.But acc…

How to apply different aggregation functions to same column by using pandas Groupby

It is clear when doingdata.groupby([A,B]).mean()We get something multiindex by level A and B and one column with the mean of each grouphow could I have the count(), std() simultaneously ?so result loo…

Can not connect to an abstract unix socket in python

I have a server written in c++ which creates and binds to an abstract unix socket with a namespace address of "\0hidden". I also have a client which is written in c++ also and this client can…

Pandas display extra unnamed columns for an excel file

Im working on a project using pandas library, in which I need to read an Excel file which has following columns: invoiceid, locationid, timestamp, customerid, discount, tax,total, subtotal, productid, …

Modifying the weights and biases of a restored CNN model in TensorFlow

I have recently started using TensorFlow (TF), and I have come across a problem that I need some help with. Basically, Ive restored a pre-trained model, and I need to modify the weights and biases of o…

Flask SQLAlchemy paginate over objects in a relationship

So I have two models: Article and Tag, and a m2m relationship which is properly set.I have a route of the kind articles/tag/ and I would like to display only those articles related to that tagI have so…

generating correlated numbers in numpy / pandas

I’m trying to generate simulated student grades in 4 subjects, where a student record is a single row of data. The code shown here will generate normally distributed random numbers with a mean of 60 …

AttributeError: list object has no attribute split

Using Python dont understand what the problem is with my coding! I get this error: AttributeError: list object has no attribute splitThis is my code:myList = [hello]myList.split()