| There are two other solutions that spring to mind, which might require quite a bit less code: 1) Take the original code, do the upload exactly in place in the original request (not even spawning a goroutine). However: protect the upload with a semaphore which only allows N-in-flight. My reasoning is, well, if the system operates with low latency when operating nominally, blocking the incoming request isn't too painful. The reason there was a problem in the first place that there were too many requests in flight and the system hit a meta-stable state where no requests could complete efficiently. 2) (or instead of (1)): If you're going to have a worker pool, why have that complicated chan-chan-Job business? It seems that `func StartProcessor` was close to being a viable solution. All you need is to start a few of those in parallel, each reading from the same `Queue`. Was there a reason to introduce the `WorkerPool chan chan Job`? That looks quite a bit more complicated than it needs to be. The queues don't need to be separate per worker unless there is some other substantial reason. -- The next thing one would need to take care of is to ensure that the whole system doesn't stall due to a broken/laggy network, so, to put some timeouts on the S3 uploads, for example, to ensure the system can return to a stable state on its own when the thundering herd has passed. |
Re: Queue: Wouldn't the queue involve locking lest two workers end up trying to work on the same request? To be completely concurrent, I guess one could use a lock-free data structure instead (or implement one on top of something like RocksDB)?
[0] http://ferd.ca/queues-don-t-fix-overload.html
[0] http://engineering.voxer.com/2013/09/16/backpressure-in-node...