Hacker News new | ask | show | jobs
by choppaface 689 days ago
If 90% of the content doesn’t get daily attention then the spider can blow out the caches and cause inordinate costs.
2 comments

Yea, this happened to us, and you can see the cache profile of a crawler here:

https://fosstodon.org/@readthedocs/112877477202118215

A bunch of the traffic hit the origin.

Not trying to splain you, but if you can turn off caching for one hit wonders, everything would be greatly improved.
Caches? You could not cache anything and 11 TPS would be just fine to run on a gameboy.
Not if that content is multiple gigbytes of data per file.
Which isn't the case for the extreme majority of all websites? We're talking about ifixit.com here, a random guide page (so quite heavy in pictures) is only 6.8MB.

I think they'll survive this massive DDOS of 76MBPS.

Same in the case of ReadTheDocs which was linked above, 10TB of traffic over 6 days, an incredible 19.29MBPS sustained. Has humanity even built technology powerful enough to handle this massive assault?!