Hacker News new | ask | show | jobs
by thrownaway2424 4473 days ago
That post doesn't mention anything about tail latency, while the GCE thing does point to P95 latency < 100ms consistently, which is nice.
1 comments

I wrote the test - Yep. Tail latency is one of the key things here. And I took 100% of all samples, as opposed to the middle 80% the tool usually reports.
What was the network utilization during the test? If these machines were lightly loaded (< 30% utilized) then the tail latency isn't surprising. :)
Network average utilization was low by design. Keeping it steady was more important than low, though, and harder too.

Latency spikes come from Cassandra flushing data to disk (large sequential IO), Java garbage collection and heap resize, and page faults during compactions (random reads).

What I did to even traffic out was to enable trickle_fsync and size the flushes, set Java's max and min heap sizes, as well as to tune the Java heap ergonomics. I treated random reads as a fact of life - I did nothing to tune that.

Doesn't GCE run on the same (physical, not logical) network as the rest of Google's production systems? If so, which I believe is the case, how can you control for network utilization?