|
|
|
|
|
by dangoldin
6645 days ago
|
|
I come from a Perl background so I've been using HTML::TreeBuilder and XML::TreeBuilder to do my parsing. It will basically load an HTML/XML file into it's own tree structure and give you an easy way to go through it. By knowing how each site names their divs/classes I am able to scrape. I took a quick glimpse at beautiful soup and it seems to be doing something similar - someone let me know if this is correct. |
|
BeautifulSoup is nothing unique, but it can handle malformed data that saves you a ton of hassle.