Hacker News new | ask | show | jobs
by d0mine 1132 days ago
To parse xml without loading it into memory, one interesting element at a time in Python:

  import xml.etree.cElementTree as etree

  def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory
https://stackoverflow.com/questions/7697710/python-running-o...

The usage is simple: getelements() generates the desired elements one by one. Found using google search for "xml memory iterparse"

1 comments

Yeah, that's an option I'm aware of, but it gets quite ugly when the rules are more than just a few.