Hacker News new | ask | show | jobs
by myself248 1576 days ago
Yes yes fine, and then I get throttled to 2 bytes/sec by the server. So I did some user-agent hijinks and set my delay to like 5000msec and that helped for a while, but my machine crashed and when I went to resume the task I was throttled again.
1 comments

>but my machine crashed

Maybe it's not the servers who throttle you then ;)

Wget will exhaust all available ram on a long enough crawl.
No, i crawled many multi-gigabyte sites with my raspberry2 for days.
I've had memory exhaustion (on a 4GB system) after I think about 600GB in a single crawl. Splitting it into multiple crawls is of course better.

That was a site specifically set up to deal with large collections of files though.