Hacker News new | ask | show | jobs
by chadhutchins10 760 days ago
We've done it. Now let's re-engineer our apps to use error codes for 200 responses and get free S3 usage.
3 comments

I worked on a team with similar cost optimisation gurus... They abused HTTP code conventions and somehow managed to wedge in two REST frameworks into the Django app that at one point had 1m+ users...
Another fan of parasitic computing - https://en.wikipedia.org/wiki/Parasitic_computing
I don't know if this counts, but I had something that I would call parasitic happen to me once.

I administered a VBulletin forum, and naturally, we installed all sorts of gewgaws onto it, including an "arcade" where people could play games, share high scores, etc.

This arcade, somehow, came with its own built-in comment system, one where users could somehow register without registering for a proper VBulletin user account on our instance, and thus without admins being notified.

One day, we discovered this whole underbelly community that had apparently been thriving under our metaphorical floorboards, and promptly evicted them. In hindsight, I probably should have found some way to let them stick around, but recently several things had happened that hardened our stance to any sort of un-wanted users.

Relevant xkcd: https://xkcd.com/1305/
If I understand TFA, you'd need to find a way to get S3 (which offers no server-side script execution, only basic file delivery) to emit an error code (403 specifically) alongside a response of useful data. Good luck...
Simple. Just encode all of your app's data and logic as a massive lookup table, each bit of which is represented by an object that either doesn't exist (a zero) or is unauthorized to access (a one).

When you read a sequential series of keys (404 403 403 404 = 0110) it will either tell you the data you were looking for or the next key name to begin reading from.

You can also perfectly parallellize those requests, making the operation highly efficient!
It said "never incur request or bandwidth charges". I assume this means you don't pay to compute the response or for the bandwidth to deliver it.

Seems you could compute the response, store it somewhere (memcached or something), and then return an error. Then have the caller make another call to retrieve the response. (To associate the two requests, have the caller generate a UUID and pass it on both calls.)

That doesn't make it entirely free, but it reduces the compute cost of every request to just reading from a cache.

(This does sound like a good way to get banned or something else nasty, so it's definitely not a recommendation.)

Well, you can probably send out one bit a time by updating your ACLs on a clock (with which your clients are also roughly synchronized) and distinguishing between 403 and 404.

take an awful lot of time to get that data out, though.

It seems to me you could just use static ACLs and create (or not) object names to cause this 403 vs 404 distinction? The drawback is that you'll be paying for the minimum retention of minimum-sized objects, not to mention all the other bucket management traffic you are using.

So you're going to have a lot of consumers of the same bit stream before you've somehow made the covert, "free" egress a net positive value versus a regular object. I imagine AWS can trivially put in place some throttling of error responses to make this impractical.

Ignoring these economic issues, imagine a content-addressing scheme like /stream-identifier/bitnumber which you can then poll to fetch one bit per request. Populate an object (which will return 403) for 1 bits and omit an object (which will return 404) for 0 bits.

You also need to know some stream length or "end of stream" limit. Otherwise you can't tell if you've read past the end or are really fetching 0 bits of a longer stream.

One strategy might be to use an 8b/10b encoding so you can detect when you're not getting a valid symbol anymore. You could treat that as end of stream if it is supposed to be static, or go into some polling mode to wait for more symbols to be posted.

Hybrid strategies might use regular objects or recursive use of these streams to publish metadata streams that tell you about the available stream names, lengths, and encoding schemes.

> take an awful lot of time to get that data out, though.

That’s what glacier is for!