Hacker News new | ask | show | jobs
by llimllib 6059 days ago
I almost always write my own stream parser with regular expressions to deal with large XML files (especially very regular ones), though it should be noted that there are stream XML parsers.
1 comments

Knowing regular expressions is an all around good idea when doing data processing. Steep learning curve but pays itself off in increased productivity.

What stream XML parsers do you use? I just get my data ready for Hadoop and let it go.

To be honest, I just kind of think I know that there are stream XML parsers? I've used cElementTree when I have small XML documents and written my own regex for larger ones. (cElementTree is definitely not a stream parser)