Hacker News new | ask | show | jobs
by mishu2 683 days ago
Playwright is basically necessary for scraping nowadays, as the browser needs to do a lot of work before the web page becomes useful/readable. I remember scraping with HTTrack back in high school and most of the sites kept working...

For my project (https://frankendash.com/), I also ran into issues with dynamically generated class names which change on every site update, so in the end I just went with saving a crop area from the website as an image and showing that.

1 comments

HTTrack was fantastic, still was a couple of years ago when I used it for a small project too.