Hacker News new | ask | show | jobs
by criddell 622 days ago
Is there a friendly way to do this? I'd feel bad burning through hundreds of gigabytes of bandwidth for a non-corporate site. Would a database snapshot be as useful?
4 comments

MyBB PHP forums have a web interface through which one can download the database as a single .sql file. It will most likely be a mess, depending on the addons that were installed on the forum.
Downloading a DB dump and crawling locally is possible, but had two gnarly show stoppers for me using wget: the forum's posts often link to other posts, and those links are absolute. Getting wget to crawl those links through localhost is hardly easy (local reverse proxy with content rewriting?). Second, the forum and its server were really unmaintained. I didn't want to spend a lot of time replicating it locally and just archive it as-is while it is still barely running
If you want to customize the scraping, there's scrapy python framework. You would always need to download the html though.
Isn't bandwidth mostly dirt cheap/free these days?
It's inexpensive, but sometimes not free. For example, Google Cloud Hosting is $0.14 / GB so 260 GB would be around $36.
its essentially free on non-extortionate hosts. Use hetzner + cloudflare and you'll essentially never pay for bandwidth