I've just released a small product using CockroachDB, in retrospect it was probably my favourite technical decision. Previously I'd used it as a toy and tested deployment strats but was skeptical (new tech and all that), but now that it's ticking along in the wild I'm very impressed across the board.
How many servers are needed to achieve this throughput?
2) >4 terabytes of unreplicated, frequently accessed data
4TB unreplicated data? Does that mean if a single node goes down you'll lose data (EDIT: I meant losing availability, not data)? That kinda ruins the whole point of having a distributed database.
3) If I'm reading the KV benchmaks correctly, it takes 5 nodes to achieve 100k tpm. That's 20k tpm per node. That's 333 tps per node. This is a 95% point read benchmark. Why is the tps (333 tps) so low? Is that normal?
4) How does CockroachDB compare to other distributed databases such as TiDB, FoundationDB, ScyllaDB?
ScyllaDB is still the fastest at a key/value workload with per-query consistency settings and quorum reads/writes across multiple regions. If you need high-performance and low-latency, ScyllaDB wins. They are close to v3.0 which will have global secondary indexes and materialized views to improve data model flexibility. FoundationDB is also key/value but much lower-level and well proven for reliability. Don't have much experience with it and the latest release just introduced multi-regional capabilities, but the general tooling and documentation is still rough and it would take more effort to build a higher-level querying layer or client library.
TiDB is interesting, but missing more features from MySQL than CRDB is missing from PostgreSQL, so it's effective if you want sharding on mysql but will need a few more releases before it gets polished. Vitess and Citus are good options if you just want sharding on top of existing mysql or postgres with full query support within a shard. There's also Yugabyte which is multi-modal Redis/Cassandra/SQL offering with multi-regional capabilities.
CRDB is a great product with some of the easiest operations (although key management is a nightmare that they do not have a good plan for). It's fast enough for point-lookups and makes it easy to distribute and replicate your data across zones and regions. All nodes are part of a single cluster so read and write latencies will be high for global deployment, with the enterprise version having a workaround for local regional reads using pinned covering indexes. That works, but further lowers write performance.
It also has trouble with large transactions and the middle ground between OLTP and OLAP with heavy joins. Good choice if you need easy scalability and SQL interface over performance and complex queries.
I would rate compatibility as actually pretty good: all but one SQL mode is supported (which is a feat in itself), and most of the SQL functions are supported.
There are some exceptions though, some which are addressable (missing functions) and some that are not (often a property of being an optimistic system).
We try to be as transparent as possible on this, which might be part of the reason why you feel there is a lot missing?
If you have specific examples, I would be happy to clarify. We also have a course designed for MySQL DBAs, which is designed to make the adoption easier: https://www.pingcap.com/tidb-academy
The technical design for views was recently completed, and I expect to see them added soon :-)
Window functions & CTEs are only very recent features in MySQL 8.0 (TiDB is 5.7 compatible). None the less, they are important for HTAP workloads, and I'm looking forward to seeing them too.
1. Between 90 and 135 16 vCPU nodes depending on cloud hardware
2. The cluster replicates this data three ways across all three nodes (so the cluster actually contains 12+tb of data) ensuring high availability. We intentionally reported the unreplicated number for clarity and comparison to TPC-C spec
3. Our graph is mislabeled. It should read transactions per second `tps`. Nice catch!
4. We can't comment on other database performance as they haven't release any TPC-C numbers.
“Between 90 and 135 16 vCPU nodes depending on cloud hardware ”
How many nodes did you use in the CRDB 2.0 TPC-C 10k benchmark? Could I say that the "5x increment" is on the same hardware condition? Thanks!
I'm the author. We've introduced transactional write pipelining (covered in a forthcoming blog post), load-aware rebalancing, and completed general performance tuning which all contribute to our improved performance numbers.
I was wondering, quite unrelated to the article, if anyone knows if CockroachDB would be suited for small databases (and comparably modest computing/memory resources). I very much like its distributed properties, but only have a simple table of usernames and corresponding cryptographic material. Is CRDB easy to run and manage?
We have clients using CRDB in pretty constrained environments, and they use it primarily because of the easy administration. I think you'll find it easier to use than a MySQL or Postgres, for example.
I would expect your use case would be better suited by use just using postgres. However if you do need to scale to the point where you'd need to distribute your database and take advantage of CRDB's capabilities, it uses the Postgres protocol, so you most likely can just migrate your data and use the same code.
Because I believe the overhead to not be worth it. I’d rather a database be cloud native and have things like sharding and scaling built in and the coordination of that built in. I grok the CockroachDB way of doing things much more than how Vitess is and shocking fact most companies aren’t Google in size and don’t really need a Vitess or CockroachDB but get swept up in the Cloud,Cloud,Cloud! craze. The seemingly brittle nature of vtablets, vtgates, etc versus something built in is valuable to me.
Our managed service is currently provisioned by cores. We automatically add nodes to your cluster based on your usage. You can also request to add more nodes if you anticipate spikes.
Someone should create a fork with work-safe name. CockroachDB brings connotation of cockroaches, who are known by eating almost everything and living almost everywhere.
you know what's ironic? that in every thread there's one of you people complaining about the name - you're all just like cockroaches! no matter how successful cockroachdb becomes, no matter how technically impressive the product becomes, the naysayers never die.
can you imagine 20 years ago someone complaining that google wasn't safe for work because it had a silly name?
newsflash dummy: stop saying/thinking/repeating stupid things like this and it'll stop being the case that everyone is so conservative that silly names are inadmissible.
do you not understand perpetuation? "doesn't fly with my boss" ---> "won't fly when I'm boss". also how about having a conversation on the merits? does that fly with your boss? does with mine.