Hacker News new | ask | show | jobs
by coolj 4706 days ago
Interestingly, both the go client and the scala client perform the same speed when talking to the scala server (~3.3s total), but the scala client performs much faster when talking to the go server (~1.9s total), whereas the go client performs much worse (~23s total, ~15s with GC disabled).

I thought the difference might partly be in socket buffering on the client, so I printed the size of the send and receive buffers on the socket in the scala client, and set them the same on the socket in the go client. This didn't actually bring the time down. Huh.

My next thought was that scala is somehow being more parallel when it evaluates the futures in Await.result. Running `tcpdump -i lo tcp port 1201` seems to confirm this. The scala client has a lot more parallelism (judging by packet sequence ids). Is that really because go's internal scheduling of goroutines is causing lock contention or lots of context switching?

And...googling a bit, it looks like that is the case: https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sL...

> Current goroutine scheduler limits scalability of concurrent programs written in Go, in particular, high-throughput servers and parallel computational programs. Vtocc server maxes out at 70% CPU on 8-core box, while profile shows 14% is spent in runtime.futex(). In general, the scheduler may inhibit users from using idiomatic fine-grained concurrency where performance is critical.

3 comments

Bear in mind that was written before Go 1.1, additionally Dimitry has made steps to address CPU underutilization and has been working with the rest of the Go team on preemption. I think these improvements will make it into Go 1.2, fingers crossed.
Interesting, but now I'm even more confused. How can we possibly explain that a (go client -> go server) (which are in separate go processes) performs far worse than (go -> scala server), given that the go server seems to be better when using the scala client?

The comments on the article page have a different report which doesn't suffer from this implausibility:

go server + go client 22.02125152

scala server + scala client 3.469

go server + scala client 3.562

scala server + go client 4.766823392

> Interesting, but now I'm even more confused. How can we possibly explain that a (go client -> go server) (which are in separate go processes) performs far worse than (go -> scala server), given that the go server seems to be better when using the scala client?

I've been curious about that as well. The major slowdown seems to be related to a specific combination of go server and client. I don't have a good explanation. I'd love to hear from someone familiar with go internals.

> go server + go client 22.02125152 > ... > scala server + go client 4.766823392

That's roughly equivalent to my numbers.

Best response here. I spent weeks trying to get a go OpenFlow controller on par with Floodlight (java). I finally gave up on tcp performance and moved on when I realized scheduling was the problem.