| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by altilunium 1413 days ago

Beautiful Soup gets the job done. I made several app by using it.

[1] https://github.com/altilunium/wistalk (Scrap wikipedia to analyze user's activity)

[2] https://github.com/altilunium/psedex (Scrap goverment website to get list of all registered online services in Indonesia)

[3] https://github.com/altilunium/makalahIF (Scrap university lecturer's web page to get list of papers)

[4] https://github.com/altilunium/wi-page (Scrap wikipedia to get most active contributors that contribute to a certain article)

[5] https://github.com/altilunium/arachnid (Web scraper, optimized for wordpress and blogger)

1 comments

Buttons840 1413 days ago

I've found lxml to be more powerful. The lxml library supports xpaths, which I don't believe Beautiful Soup does?

In other words, consider lxml as well.

link

yifanl 1413 days ago

lxml is supported (mostly) out of the box for BeautifulSoup, so you can it as a parser behind BS4's nicer interface, which I believe the OP does in the linked codebases.

link

traverseda 1413 days ago

I reach for selectolax first if I'm doing relatively tame stuff. Also css selectors are nice.

link