Hacker News new | ask | show | jobs
by dangoldin 6645 days ago
I come from a Perl background so I've been using HTML::TreeBuilder and XML::TreeBuilder to do my parsing. It will basically load an HTML/XML file into it's own tree structure and give you an easy way to go through it. By knowing how each site names their divs/classes I am able to scrape.

I took a quick glimpse at beautiful soup and it seems to be doing something similar - someone let me know if this is correct.

1 comments

Yes. You can even regex search through the tree. Weeeeee!

BeautifulSoup is nothing unique, but it can handle malformed data that saves you a ton of hassle.