It also has the 'objectify' API, where you can access XML nodes via regular object access (i.e. `access.nodes[0].like.this`).
http://lxml.de/objectify.html
I have two major gripes with lxml that this library solves, but I agree that for serious projects, lxml is the correct choice.
1) you have to build libxml to use lxml
2) lxml has a large, powerful, complicated API
For 1), A friend of mine had to do an annoying workaround to use lxml on his box, due to its limited memory preventing him from being able to build libxml. Because xmltodict is Expat[1]-based, you don't have to build libxml in your environment to use it.
For 2), When I went to write a simple rss-reader project this past weekend, I dreaded going back to lxml. I knew that I'd have to go peruse its huge documentation to answer questions about whether to use lxml.XML or lxml.fromstring, whether methods I wanted were on Elements or ElementTrees, xpath syntax, custom parsers, etc. If I'd ever seen the objectify API I'd forgotten about it, because there's just so much _other_ stuff in lxml.
I happen to have found xmltodict in a brief search for lxml alternatives. It's in PyPI, so pip grabbed it with no complaints. It installed without building anything. And in less than a minute of glancing at the README, I grokked the API as "pydict = xmltodict.parse(xml_string)". I don't know if there are other things. I never had to find out.
Less than 10 minutes from finding it to forgetting I was reading XML as a source -- really a wonderful project. But if I were doing something 'serious', I'd absolutely use lxml. That large API and those byzantine docs exist for a good reason: they're dealing with XML properly. But sometimes coders just wanna have fun, or build a quick prototype or hack.
Agree that lxml is certainly quite large and batteries-included, but I've always found I can do pretty much everything I want to just using a few select methods, namely etree and xpath.
For RSS reading, here's a straightforward example. Obviously you can factor out the index access on each item, and calling `/text()` too.
import urllib2
from lxml import etree
data = urllib2.urlopen('https://news.ycombinator.com/rss').read()
root = etree.XML(data)
for i, item in enumerate(root.xpath('.//item')):
print i, item.xpath('title/text()')[0]
print item.xpath('description/text()')[0]
print item.xpath('link/text()')[0]
print
All pretty simple, xpath selectors can get a bit gnarly at times though. The tradeoff being that you can be very expressive with them.
Good, I had never seen this transformation tip in LXML
But usually, yes, LXML is "good" meaning it's the least worse way of dealing with XML
Also, it has some idiosyncrasies, like insisting on adding the namespace on tag names, so you end with something like {http://example.com/your.xlsd}.index (I don't remember it exactly and I don't have an example here with me)
Correctly handling namespaced QNames is a requirement, not a bug. It makes things awkward at times, but that's a job for lib writers to provide decent interfaces.
I haven't used LXML in a while, but ElementTree, for example, forces you to use the QName in XPath expressions, which is technically correct but a huge pain; it would be nice if there was a ScrewNamespace option that would allow "simple" searches, although this might blow up in your face one day (when two namespaces define the same element name, and your xpath search brings up elements you didn't really want).
I also found the namespacing to be a bit weird, and it took quite a while to grok the documentation. In case anyone wants a working example, I implemented a wrapper to drop the namespacing (resulting in simple objectify attribute access) for one particular XML schema here:
https://github.com/timstaley/voevent-parse/blob/master/voepa...
That's the "node-name", i.e. the fully qualified name of the tag in its namespace. You probably wanted to ask for the local-name, which in your example would just be "index". Not sure how to with LXML, but it's a common mistake people make when dealing with XML.
There's however a small feature in xmltodict that most people overlook: the streaming mode. I actually wrote xmltodict the day I tried to parse a Wikipedia dump, I just couldn't keep it all in memory but needed something more high-level than SAX.
xmltodict is in no way trying to compete with LXML feature-wise (no support for namespaces yet, just to name one thing). It's just a lightweight approach to roundtrip between XML and JSON documents that worked for my use case and decided to share it.
Yes, my first thought when reading the headline was that it would be about LXML.
However, while LXML can do this, and makes it easy, the documentation does not stress this way of using LXML. I like this project's emphasis on simplicity and doing one thing. It's the difference between "You can use LXML to get a dict" vs "Here is how to use xmltodict to get a dict". And it's right there in the name. Emphasis and naming are important when getting started.
LXML is a little unapproachable when you first use it but it's all of the other great things you mentioned too. Now that I've used it a lot I would never consider trading it for something simpler. It can handle any situation you're going to run into. I'd suggest that if people were looking to do anything more than a quick hack they invest an afternoon in learning the LXML API.
1) you have to build libxml to use lxml
2) lxml has a large, powerful, complicated API
For 1), A friend of mine had to do an annoying workaround to use lxml on his box, due to its limited memory preventing him from being able to build libxml. Because xmltodict is Expat[1]-based, you don't have to build libxml in your environment to use it.
For 2), When I went to write a simple rss-reader project this past weekend, I dreaded going back to lxml. I knew that I'd have to go peruse its huge documentation to answer questions about whether to use lxml.XML or lxml.fromstring, whether methods I wanted were on Elements or ElementTrees, xpath syntax, custom parsers, etc. If I'd ever seen the objectify API I'd forgotten about it, because there's just so much _other_ stuff in lxml.
I happen to have found xmltodict in a brief search for lxml alternatives. It's in PyPI, so pip grabbed it with no complaints. It installed without building anything. And in less than a minute of glancing at the README, I grokked the API as "pydict = xmltodict.parse(xml_string)". I don't know if there are other things. I never had to find out.
Less than 10 minutes from finding it to forgetting I was reading XML as a source -- really a wonderful project. But if I were doing something 'serious', I'd absolutely use lxml. That large API and those byzantine docs exist for a good reason: they're dealing with XML properly. But sometimes coders just wanna have fun, or build a quick prototype or hack.
[1] http://docs.python.org/2/library/pyexpat.html