If you listen to the people complaining about bots at the moment, some bots are scraping the same pages over and over to the tune of terabytes per day because the bot operators have unlimited money and their targets don't.
I rather think the cause is that inbound bandwidth is usually free, so they need maybe 1/100th of the money because requests are smaller than responses (plus discounts they get for being big customers)
> I rather think the cause is that inbound bandwidth is usually free, so they need maybe 1/100th of the money because requests are smaller than responses (plus discounts they get for being big customers)
Seems like there's the potential to take advantage of this for a semi-custom protocol, if there's a desire to balance costs for serving data while still making things available to end users. We'd have the server reply to the initial request with a new HTTP response instructing the client to re-request with a POST containing an N-byte (N = data size) one-time pad. The client can receive this, generate random data (or all zeros, up to the client); and the server then will send the actual response XOR'd with the one-time pad.
Upside: Most end users don't pay for upload; if bot operators do, this incurs a dollar cost only to them. Downside: Increased download cost for the web site operator (but we've postulated that this is small compared to upload cost), extra round trip, extra time for each request (especially for end users with asymmetric bandwidth).
May work for small pages, like most of my webpages besides some downloadable files, but megabytes of JavaScript on an average (mobile?) connection are going to take very significantly longer to load, cost more battery, and take twice as much from your data bundle
Perhaps it's effective as bot deterrent when someone incurs, say, a ten times higher than median load (as measured in something like CPU time per hour or bandwidth per week or so). It will not prevent anyone from seeing your pages so information is still free, but it levels the playing field -- at least, for those with free inbound bandwidth dealing with bots that pay for outgoing bandwidth
I rather think the cause is that inbound bandwidth is usually free, so they need maybe 1/100th of the money because requests are smaller than responses (plus discounts they get for being big customers)