Hacker News new | ask | show | jobs
by fastest963 2698 days ago
We are a user of BigTable, 30k writes/sec and 300k reads/sec, and compared to the other managed services (Pub/Sub, Memorystore, etc), it has been the most stable by far, but we have to scale up our node count at times when we don't think we should have to (based on the perf described in the docs) as well as the latency/errors described in the article. They also added storage caps based on node count last year that increased our costs dramatically.

The Key Visualizer has been a huge help but there's still not enough metrics and tooling to understand when things do go wrong or what is happening behind the scenes. Luckily we have a cache sitting in front of Bigtable for reads that allows us to absorb most of the described intermittent issues because cost has prevented us from doing any sort of replication.

1 comments

Interestingly, for us scaling up doesn't really solve the short term unavailability. It seems to be only somewhat related to load as it does seem to hit more often at high traffic times but we have also seen it at low traffic times.

Putting in that cache is a great move. Cache is challenging for us as we get hits over a very wide range of keys.