|
|
|
|
|
by sean0-
1320 days ago
|
|
At this point, esp with the upcoming 22.2 release, nearly all failure modes are handled on a machine timescale. There used to be some unbounded failure scenarios that you could run into where a human would have to kick a node, but to the best of my knowledge, they have all been resolved, and unplanned failures are bounded to about 10s now. One of the bigger burdens from large fleets like this is fleet management and maintenance, specifically managing upgrades. If you have a story around that, you're in good shape. Most of the excitement comes from moving workloads onto crdb from PG, where it's not uncommon to have workloads with right-leaning indexes or workloads. This class of problem is solvable, but it will catch some people by surprise. At the end of the day, having a database that can absorb punishment from as small as a few dozen QPS up to millions of QPS is a big reliability win, esp with the self-healing characteristics of the database technology. |
|
How does it compare to the overhead and experience of DIY postgres?