|
Last time this question was asked on HN was in 2017 (https://news.ycombinator.com/item?id=15694118), a lot has changed in the last 5 years in the world of web scraping (legal landscape, antibot unblockers, data type specific APIs, etc), so I thought it may be a good idea to refresh this question and see what are the most popular tools used by the HN community these days. |
I'm really impressed by Playwright. It feels like it has learned all of the lessons from systems like Selenium that came before it - it's very well designed and easy to apply to problems.
I wrote my own CLI scraping tool on top of Playwright a few months ago, which has been a fun way to explore Playwright's capabilities: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...