| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jerf 792 days ago

Not always. There are HTTP servers where you are making an HTTP request for an in-memory value where the work is less than the parsing cost for an HTTP request. There are many HTTP services where the time to fulfill the request is much longer than the parse cost of the request, but that time is not 100% CPU of either the server or any given service, because there's a lot of back and forth delays and latency. There are many HTTP services where they are 100% CPU and orders of magnitude greater than the cost of a HTTP request parse, but are still on the order of <1ms and if such a service was actually a message queue you might still be able to clog a message queue at least somewhat.

This is a very pessimal case, though. You make a tiny HTTP request which is parsed in microseconds at the most, and it invokes somewhere between one to five million microseconds of 100% utilization of an expensive resource. A thousand queuings per second would be fairly easy for a RPi, it could handle that no problem (at least assuming you use a decent language to manage it; a super fancy Python framework that also does a lot of metaprogramming might choke under the load, sure, though some half-carefully written Python still ought to be able to handle this), while those 1000 requests/second would require around 2000 GPUs to dispatch them in real time so we can maintain that 1000 rps. I'm pretty sure you can add an order of magnitude before I'd really start worrying about the RPi as a queuing mechanism, and you're getting to the RPi being able to queue for ~100,000 GPUs without too much strain. I don't know how many GPUs OpenAI has but that's got to be getting pretty close to their order of magnitude. They may have a million, I doubt ten million.

(Of course, I wouldn't actually do this on an RPi; I'm just using that as a performance benchmark.)