|
|
|
|
|
by tsimionescu
793 days ago
|
|
Isn't the same true then of an HTTP server? Handling the polling requests is a minute amount of work compared to running the real jobs. And you've addressed the scalability problem, but not the connectivity issues that generally plague long-lived connections on the Internet at large. |
|
This is a very pessimal case, though. You make a tiny HTTP request which is parsed in microseconds at the most, and it invokes somewhere between one to five million microseconds of 100% utilization of an expensive resource. A thousand queuings per second would be fairly easy for a RPi, it could handle that no problem (at least assuming you use a decent language to manage it; a super fancy Python framework that also does a lot of metaprogramming might choke under the load, sure, though some half-carefully written Python still ought to be able to handle this), while those 1000 requests/second would require around 2000 GPUs to dispatch them in real time so we can maintain that 1000 rps. I'm pretty sure you can add an order of magnitude before I'd really start worrying about the RPi as a queuing mechanism, and you're getting to the RPi being able to queue for ~100,000 GPUs without too much strain. I don't know how many GPUs OpenAI has but that's got to be getting pretty close to their order of magnitude. They may have a million, I doubt ten million.
(Of course, I wouldn't actually do this on an RPi; I'm just using that as a performance benchmark.)