| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by peterwaller 3997 days ago

There are two other solutions that spring to mind, which might require quite a bit less code:

1) Take the original code, do the upload exactly in place in the original request (not even spawning a goroutine). However: protect the upload with a semaphore which only allows N-in-flight.

My reasoning is, well, if the system operates with low latency when operating nominally, blocking the incoming request isn't too painful. The reason there was a problem in the first place that there were too many requests in flight and the system hit a meta-stable state where no requests could complete efficiently.

2) (or instead of (1)): If you're going to have a worker pool, why have that complicated chan-chan-Job business? It seems that `func StartProcessor` was close to being a viable solution. All you need is to start a few of those in parallel, each reading from the same `Queue`. Was there a reason to introduce the `WorkerPool chan chan Job`? That looks quite a bit more complicated than it needs to be. The queues don't need to be separate per worker unless there is some other substantial reason.

The next thing one would need to take care of is to ensure that the whole system doesn't stall due to a broken/laggy network, so, to put some timeouts on the S3 uploads, for example, to ensure the system can return to a stable state on its own when the thundering herd has passed.

3 comments

ignoramous 3997 days ago

Re: Semaphore: Not dropping the connection might mean we might starve others incoming msgs of resources (which might be a good thing) [0]. Also, releasing the semaphore back into the pool in case of failures takes on a happy complication of having to deal with errors beyond one's control.

Re: Queue: Wouldn't the queue involve locking lest two workers end up trying to work on the same request? To be completely concurrent, I guess one could use a lock-free data structure instead (or implement one on top of something like RocksDB)?

[0] http://ferd.ca/queues-don-t-fix-overload.html

[0] http://engineering.voxer.com/2013/09/16/backpressure-in-node...

link

peterwaller 3995 days ago

Semaphore, I'm suggesting:

  Block on N-Semaphore, with timeout
  Do timeout upload
  Replace N-Semaphore

If you don't always replace the semaphore, that's a bug.

Queue: I'm just comparing to what the article does. It already has contention on a queue (the chan), it's just the chan-chan-Worker rather than chan-Job. In practice, go channels happily handle millions of messages contending to multiple workers just fine. Consider this test example where you aren't even actually burning any CPU to perform the work:

    package main
    
    func main() {
        q := make(chan int)    
        for i := 0; i < 10; i++ {    
                go func() {    
                        for x := range q {    
                                x = x * 10    
                        }    
                }()    
        }    
    
        for i := 0; i < 1000000; i++ {    
                q <- i    
        }    
    }

On my laptop, it runs in 0.333s single core, and it's slightly slower when you set GOMAXPROCS > 1. But not much slower, the total runtime goes to 0.4-0.5s or so. (Measured with go 1.4). As soon as you do any actual work with the messages you are passing around, the overhead of locking will be lost in the noise.

link

placeybordeaux 3997 days ago

Yeah I thought the double chan was a bit strange. It's like mutliple lines at checkout instead of one big line, it just increases the variance.

link

eva1984 3997 days ago

Timeout/Retry/ExponentialBackOff

link