Progress in performance and scalability with CockroachDB | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Progress in performance and scalability with CockroachDB (cockroachlabs.com)
	105 points by awoods187 2766 days ago

9 comments

brod 2766 days ago

I've just released a small product using CockroachDB, in retrospect it was probably my favourite technical decision. Previously I'd used it as a toy and tested deployment strats but was skeptical (new tech and all that), but now that it's ticking along in the wild I'm very impressed across the board.

zzzcpan 2766 days ago

How does your infrastructure look like? Do you deploy it in a single datacenter or even in the same rack on a couple of servers?

qaq 2766 days ago

Are you using enterprise ? If not how are you handling backups?

continuations 2766 days ago

A few questions:

1) >631851 tpmC

How many servers are needed to achieve this throughput?

2) >4 terabytes of unreplicated, frequently accessed data

4TB unreplicated data? Does that mean if a single node goes down you'll lose data (EDIT: I meant losing availability, not data)? That kinda ruins the whole point of having a distributed database.

3) If I'm reading the KV benchmaks correctly, it takes 5 nodes to achieve 100k tpm. That's 20k tpm per node. That's 333 tps per node. This is a 95% point read benchmark. Why is the tps (333 tps) so low? Is that normal?

4) How does CockroachDB compare to other distributed databases such as TiDB, FoundationDB, ScyllaDB?

manigandham 2766 days ago

ScyllaDB is still the fastest at a key/value workload with per-query consistency settings and quorum reads/writes across multiple regions. If you need high-performance and low-latency, ScyllaDB wins. They are close to v3.0 which will have global secondary indexes and materialized views to improve data model flexibility. FoundationDB is also key/value but much lower-level and well proven for reliability. Don't have much experience with it and the latest release just introduced multi-regional capabilities, but the general tooling and documentation is still rough and it would take more effort to build a higher-level querying layer or client library.

TiDB is interesting, but missing more features from MySQL than CRDB is missing from PostgreSQL, so it's effective if you want sharding on mysql but will need a few more releases before it gets polished. Vitess and Citus are good options if you just want sharding on top of existing mysql or postgres with full query support within a shard. There's also Yugabyte which is multi-modal Redis/Cassandra/SQL offering with multi-regional capabilities.

CRDB is a great product with some of the easiest operations (although key management is a nightmare that they do not have a good plan for). It's fast enough for point-lookups and makes it easy to distribute and replicate your data across zones and regions. All nodes are part of a single cluster so read and write latencies will be high for global deployment, with the enterprise version having a workaround for local regional reads using pinned covering indexes. That works, but further lowers write performance.

It also has trouble with large transactions and the middle ground between OLTP and OLAP with heavy joins. Good choice if you need easy scalability and SQL interface over performance and complex queries.

morgo 2766 days ago

Hi! I work for PingCAP, the company behind TiDB and come from previously working on MySQL.

The gap of features missing is documented here: https://www.pingcap.com/docs/sql/mysql-compatibility/

I would rate compatibility as actually pretty good: all but one SQL mode is supported (which is a feat in itself), and most of the SQL functions are supported.

There are some exceptions though, some which are addressable (missing functions) and some that are not (often a property of being an optimistic system).

We try to be as transparent as possible on this, which might be part of the reason why you feel there is a lot missing?

If you have specific examples, I would be happy to clarify. We also have a course designed for MySQL DBAs, which is designed to make the adoption easier: https://www.pingcap.com/tidb-academy

manigandham 2766 days ago

Good to see the progress, I was looking at the roadmap page: https://github.com/pingcap/docs/blob/master/ROADMAP.md

Views and CTEs are probably the biggest missing pieces now.

morgo 2766 days ago

The technical design for views was recently completed, and I expect to see them added soon :-)

Window functions & CTEs are only very recent features in MySQL 8.0 (TiDB is 5.7 compatible). None the less, they are important for HTAP workloads, and I'm looking forward to seeing them too.

awoods187 2765 days ago

I'm the author of the post.

1. Between 90 and 135 16 vCPU nodes depending on cloud hardware 2. The cluster replicates this data three ways across all three nodes (so the cluster actually contains 12+tb of data) ensuring high availability. We intentionally reported the unreplicated number for clarity and comparison to TPC-C spec 3. Our graph is mislabeled. It should read transactions per second `tps`. Nice catch! 4. We can't comment on other database performance as they haven't release any TPC-C numbers.

shenli3514 2764 days ago

“Between 90 and 135 16 vCPU nodes depending on cloud hardware ” How many nodes did you use in the CRDB 2.0 TPC-C 10k benchmark? Could I say that the "5x increment" is on the same hardware condition? Thanks!

evrydayhustling 2766 days ago

This is a crazy multiple! Anyone from the Cockroach team up for sharing what the key innovations were that are driving the improved performance?

awoods187 2766 days ago

I'm the author. We've introduced transactional write pipelining (covered in a forthcoming blog post), load-aware rebalancing, and completed general performance tuning which all contribute to our improved performance numbers.

Confiks 2766 days ago

I was wondering, quite unrelated to the article, if anyone knows if CockroachDB would be suited for small databases (and comparably modest computing/memory resources). I very much like its distributed properties, but only have a simple table of usernames and corresponding cryptographic material. Is CRDB easy to run and manage?

andreimatei1 2766 days ago

We have clients using CRDB in pretty constrained environments, and they use it primarily because of the easy administration. I think you'll find it easier to use than a MySQL or Postgres, for example.

tshannon 2766 days ago

I would expect your use case would be better suited by use just using postgres. However if you do need to scale to the point where you'd need to distribute your database and take advantage of CRDB's capabilities, it uses the Postgres protocol, so you most likely can just migrate your data and use the same code.

manigandham 2766 days ago

Sqlite would be my first recommendation, unless you need client/server access.

yellowapple 2766 days ago

I think the GP's stated need for replication would preclude SQLite unless one's willing to write one's own replication system.

manigandham 2766 days ago

Where's the stated need for replication?

yellowapple 2766 days ago

"I very much like its distributed properties"

gigatexal 2766 days ago

I’m currently evaluating this as an alternative to vitess + percona mysql. But strict seralizability has its limitations.

eloff 2766 days ago

What do you mean "strict seralizability has its limitations", you need something stricter? Or you have a need for something weaker for some reason?

gigatexal 2765 days ago

In the same way single threaded has limitations.

segmondy 2766 days ago

Why as an alternative to vitess? Are you hitting limits with vitess?

gigatexal 2765 days ago

Because I believe the overhead to not be worth it. I’d rather a database be cloud native and have things like sharding and scaling built in and the coordination of that built in. I grok the CockroachDB way of doing things much more than how Vitess is and shocking fact most companies aren’t Google in size and don’t really need a Vitess or CockroachDB but get swept up in the Cloud,Cloud,Cloud! craze. The seemingly brittle nature of vtablets, vtgates, etc versus something built in is valuable to me.

qaq 2766 days ago

Now we just need some benchmarks on a reasonable size dataset like 100TB and up

nawfalhasan 2765 days ago

Not an easy benchmark..

qaq 2765 days ago

One would imaging they are testing with much larger datasets internally.

nawfalhasan 2766 days ago

I'm totally new to cockroach so I have 2 questions..

1. Is there a managed service of this db where it auto scales, does geo replication etc all by itself?

2. Is there any really good book on cockroachdb?

orangechairs 2766 days ago

We released a managed version at the end of October, with auto-scaling, geo-replication, etc. --> https://www.cockroachlabs.com/product/managed/

Not sure that there any books on it yet.

manigandham 2766 days ago

The managed service doesn't autoscale, it's provisioned capacity by cores. We just did a call about it.

orangechairs 2766 days ago

Our managed service is currently provisioned by cores. We automatically add nodes to your cluster based on your usage. You can also request to add more nodes if you anticipate spikes.

nawfalhasan 2765 days ago

Thank you..

qaq 2766 days ago

At what point is it cost effective to run CRDB vs PostgreSQL?

anticensor 2766 days ago

Someone should create a fork with work-safe name. CockroachDB brings connotation of cockroaches, who are known by eating almost everything and living almost everywhere.

mlevental 2766 days ago

you know what's ironic? that in every thread there's one of you people complaining about the name - you're all just like cockroaches! no matter how successful cockroachdb becomes, no matter how technically impressive the product becomes, the naysayers never die.

can you imagine 20 years ago someone complaining that google wasn't safe for work because it had a silly name?

newsflash dummy: stop saying/thinking/repeating stupid things like this and it'll stop being the case that everyone is so conservative that silly names are inadmissible.

h1d 2766 days ago

That attitude does not fly with your boss.

Why limit the adoption for no good reason by choosing a weird name intentionally?

mlevental 2766 days ago

do you not understand perpetuation? "doesn't fly with my boss" ---> "won't fly when I'm boss". also how about having a conversation on the merits? does that fly with your boss? does with mine.

accnumnplus1 2766 days ago

There's also the irony in their name - anticensor.

andreimatei1 2766 days ago

https://github.com/tbg/bikesheddb

drodgers 2766 days ago

That's the idea though — your data will be hard to kill.

ofrzeta 2766 days ago

In their presentation they have it abbreviated as CRDB, so that might be a viable direction without pain.

geoka9 2766 days ago

> known by eating almost everything and living almost everywhere

Good attributes for a DB though?

danellis 2766 days ago

I have to admit, the name does set me on edge slightly. shudder