[1] https://github.com/altilunium/wistalk (Scrap wikipedia to analyze user's activity)
[2] https://github.com/altilunium/psedex (Scrap goverment website to get list of all registered online services in Indonesia)
[3] https://github.com/altilunium/makalahIF (Scrap university lecturer's web page to get list of papers)
[4] https://github.com/altilunium/wi-page (Scrap wikipedia to get most active contributors that contribute to a certain article)
[5] https://github.com/altilunium/arachnid (Web scraper, optimized for wordpress and blogger)
In other words, consider lxml as well.
In other words, consider lxml as well.