| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dangoldin 6645 days ago
	I come from a Perl background so I've been using HTML::TreeBuilder and XML::TreeBuilder to do my parsing. It will basically load an HTML/XML file into it's own tree structure and give you an easy way to go through it. By knowing how each site names their divs/classes I am able to scrape. I took a quick glimpse at beautiful soup and it seems to be doing something similar - someone let me know if this is correct.

1 comments

Yes. You can even regex search through the tree. Weeeeee!

BeautifulSoup is nothing unique, but it can handle malformed data that saves you a ton of hassle.