Hacker News new | ask | show | jobs
by altilunium 1413 days ago
Beautiful Soup gets the job done. I made several app by using it.

[1] https://github.com/altilunium/wistalk (Scrap wikipedia to analyze user's activity)

[2] https://github.com/altilunium/psedex (Scrap goverment website to get list of all registered online services in Indonesia)

[3] https://github.com/altilunium/makalahIF (Scrap university lecturer's web page to get list of papers)

[4] https://github.com/altilunium/wi-page (Scrap wikipedia to get most active contributors that contribute to a certain article)

[5] https://github.com/altilunium/arachnid (Web scraper, optimized for wordpress and blogger)

1 comments

I've found lxml to be more powerful. The lxml library supports xpaths, which I don't believe Beautiful Soup does?

In other words, consider lxml as well.

lxml is supported (mostly) out of the box for BeautifulSoup, so you can it as a parser behind BS4's nicer interface, which I believe the OP does in the linked codebases.
I reach for selectolax first if I'm doing relatively tame stuff. Also css selectors are nice.