Hacker News new | ask | show | jobs
by Karrot_Kream 868 days ago
Have you had any trouble with BeautifulSoup? I've thought about doing something similar but wanted something more robust with scraping. I've thought about using Puppeteer.
1 comments

It works pretty well. This morning running on what were the top 20 articles on Hacker News, it failed to extract data for one article and that was only because the link was a PDF. On an average day I get one or two failure per 20 articles, for various reasons (click-throughs, information in pictures, PDF links) and I haven't tried very hard to improve it.