Hacker News new | ask | show | jobs
by mkl 815 days ago
Can shot-scraper load a bunch of content on an "infinite scroll" page before saving? I'm guessing Monolith can't as it has no JS. The most effective way I've found to work through the history of a big YouTube channel is to hold page-down for a while then save to a static "Web Page, Complete" HTML file, but it's a bit clunky.
2 comments

shot-scraper has a feature that's meant to help with this: you can inject additional al JavaScript into the page that can then act before the screenshot is taken.

I use that for things like accepting cookie banners, but using it to scroll down to trigger additional loading should work too.

There's also a --wait-for option which takes a JavaScript expression and polls until it's true before taking the shot - useful for if there's custom loading behavior you need to wait for.

Documentation here: https://shot-scraper.datasette.io/en/stable/screenshots.html

I found other problems in the area when trying to do this e.g. a lot of landing pages have hidden content that only animates in when you scroll down, subscribe/cookie overlays/modals covering content, hero headers that takes the height of the viewport ("height: 100vh") so if you make the page height large for taking a screenshot the header will cover all of it, and also sticky headers get in the way if you want to try scrolling while take multi-screenshots that are combined at the end.

You can come up with workarounds for each, but it's still hacky and there's always going to be other pages that need special treatment.