Question 1

Essentially, I have a 6.4GB XML file that I'd like to convert to JSON then save it to disk. I'm currently running OSX 10.8.4 with an i7 2700k and 16GBs of ram, and running Python 64bit (double checked). I'm getting an error that I don't have enough memory to allocate. How do I go about fixing this?

print 'Opening'
f = open('large.xml', 'r')
data = f.read()
f.close()print 'Converting'
newJSON = xmltodict.parse(data)print 'Json Dumping'
newJSON = json.dumps(newJSON)print 'Saving'
f = open('newjson.json', 'w')
f.write(newJSON)
f.close()

The Error:

Python(2461) malloc: *** mmap(size=140402048315392) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):File "/Users/user/Git/Resources/largexml2json.py", line 10, in <module>data = f.read()
MemoryError

Question 2

Many Python XML libraries support parsing XML sub elements incrementally, e.g. xml.etree.ElementTree.iterparse and xml.sax.parse in the standard library. These functions are usually called "XML Stream Parser".

The xmltodict library you used also has a streaming mode. I think it may solve your problem

https://github.com/martinblech/xmltodict#streaming-mode

Python - Convert Very Large (6.4GB) XML files to JSON

Related Q&A

Python create tree from a JSON file

disable `functools.lru_cache` from inside function

How to clear tf.flags?

Stochastic Optimization in Python

Pandas convert yearly to monthly

Firebase database data to R

Django 1.8 Migrations - NoneType object has no attribute _meta

Manage dependencies of git submodules with poetry

Create Boxplot Grouped By Column

How can I configure gunicorn to use a consistent error log format?