| I have two major gripes with lxml that this library solves, but I agree that for serious projects, lxml is the correct choice. 1) you have to build libxml to use lxml 2) lxml has a large, powerful, complicated API For 1), A friend of mine had to do an annoying workaround to use lxml on his box, due to its limited memory preventing him from being able to build libxml. Because xmltodict is Expat[1]-based, you don't have to build libxml in your environment to use it. For 2), When I went to write a simple rss-reader project this past weekend, I dreaded going back to lxml. I knew that I'd have to go peruse its huge documentation to answer questions about whether to use lxml.XML or lxml.fromstring, whether methods I wanted were on Elements or ElementTrees, xpath syntax, custom parsers, etc. If I'd ever seen the objectify API I'd forgotten about it, because there's just so much _other_ stuff in lxml. I happen to have found xmltodict in a brief search for lxml alternatives. It's in PyPI, so pip grabbed it with no complaints. It installed without building anything. And in less than a minute of glancing at the README, I grokked the API as "pydict = xmltodict.parse(xml_string)". I don't know if there are other things. I never had to find out. Less than 10 minutes from finding it to forgetting I was reading XML as a source -- really a wonderful project. But if I were doing something 'serious', I'd absolutely use lxml. That large API and those byzantine docs exist for a good reason: they're dealing with XML properly. But sometimes coders just wanna have fun, or build a quick prototype or hack. [1] http://docs.python.org/2/library/pyexpat.html |
E.g. for extracting URLs from a sitemap:
For RSS reading, here's a straightforward example. Obviously you can factor out the index access on each item, and calling `/text()` too. All pretty simple, xpath selectors can get a bit gnarly at times though. The tradeoff being that you can be very expressive with them.