“Follow-The-Workload” Beats the Latency-Survivability Tradeoff in CockroachDB | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	“Follow-The-Workload” Beats the Latency-Survivability Tradeoff in CockroachDB (cockroachlabs.com)
	78 points by awoods187 3115 days ago

7 comments

muxator 3115 days ago

I always longed for the existence of such an Ops-friendly DB, which - building on solid distributed systems concepts - could make easy and correct the hard things (consistency, geographical replication, effective possibility of zero downtime).

Now that I have a good candidate, I find myself wondering whether the performances would be good enough to ditch traditional DBs (I know: I should define "workload" before thinking about "performances"...)

I guess is the curse of tradeoffs: nothing in life is free.

jinqueeny 3115 days ago

Another good candidate is TiDB (https://github.com/pingcap/tidb). It has elastic scalability, ACID compliances, high availability, etc.

At least, TiDB, CRDB, RethinkDB are open source :)

zokier 3114 days ago

> At least, TiDB, CRDB, RethinkDB are open source :)

So is CockroachDB too?

a-robinson 3114 days ago

Yes: https://github.com/cockroachdb/cockroach

The parent poster was referring to CockroachDB as CRDB.

jinqueeny 3114 days ago

Yep, thanks for your clarification!

arrty88 3115 days ago

We had it with RethinkDB already!

manigandham 3115 days ago

Except it has lacking performance and is now impractical without a backing support company.

a012 3115 days ago

Does rethinkdb have ACID? No?

elvinyung 3115 days ago

Whether that actually matters is arguable: No one has performant distributed (i.e. cross-partition) ACID. (Except maybe Google.)

For the thing that does matter, which is ACID within the same partition/entity group, I would argue that RethinkDB already does have that.

Mike Stonebraker says that there are no fast distributed transactions [1]. I'm inclined to believe him, and that's not just an appeal to authority.

[1] https://www.youtube.com/watch?v=KRcecxdGxvQ&t=54m22s

ericand 3115 days ago

For reference, this is part 2 of a two part series. Here is part 1: https://www.cockroachlabs.com/blog/tunable-controls-for-late...

ahahuaya 3115 days ago

Cockroachdb looks very exciting. I’m waiting for it’s Postgres compatibility to be far enough along that it works with Elixir/Phoenix. Is that likely to happen any time soon?

legojoey17 3115 days ago

If you are willing to give it a shot, there is currently a fork of postgrex from someone in the Elixir community! It handles some of rough edges of the incompatibility. You can find it at https://hexdocs.pm/postgrex_cdb/readme.html and the source code at https://github.com/jumpn/postgrex.

For any other issues you run into when using it, you may want to see the discussion about postgrex compatibility at https://github.com/cockroachdb/cockroach/issues/5582. If you run into different problems please do file the issue :).

I'm also an Elixir junkie, so I would also love to see ORM compatibility here! I definitely want to dedicated time to it if I can. (Disclaimer: I'm currently interning at Cockroach Labs.)

cpursley 3115 days ago

I'd love to see a port of https://github.com/begriffs/postgrest to Elixir. Cockroachdb + Elixir-based Postgrest implementation would be the ultimate platform for realtime distributed web scale.

sempron64 3115 days ago

Can anyone elaborate as to how CockroachDB compares to Scylla? It seems to me like they're very similar on a surface level.

_uhtu 3115 days ago

Sure!

Cockroach DB is a full SQL compatible, strongly consistent key-value store. This means you can get the scalability and performance of a key-value store alongside the comfortable SQL query language. You, and all the other developers who work with you who already know SQL, can do pretty much all the things you know and love from SQL like joins, secondary indexes, etc with full ACID compliance. This means that when you read data from the database you always know with 100% certainty you're going to get up to date values (no stale reads).

ScyllaDB is similar in that data is stored in tables with a defined schema, but it uses a different query language, CQL[1], which is often similar to SQL. You can't to joins but you can have secondary indexes. You can store most of the same data types that you know and love from a standard SQL store. Interestingly enough, you get to CHOOSE the level of consistency you get, so you can make your ScyllaDB strongly consistent or choose from an array of eventually consistent options[2]. Most people however go with one of the eventually consistent options, which allow Scylla to be insanely performant and scalable. At the cost of strong consistency, you get an extremely high performance at an almost infinite scale. CockroachDB, while performant and scalable, can't match it here. It stands almost on a tier of it's own in terms of scalability and performance.

So really, the choice is yours based on what you're looking for. I'd choose CockroachDB for my purposes since I'm not storing Apple levels of data and consistency is important to my work, but your specifications and needs may be different.

[1] http://docs.scylladb.com/getting-started/ddl/ [2] http://docs.scylladb.com/architecture/architecture-fault-tol...

manigandham 3115 days ago

Ultimately the same foundations of a globally distributed key/value store. Scylla is the Cassandra is a mashup of BigTable/Dynamo wide-column advanced key/value design allowing for very high scalability and availability, but the data model and querying abilities are not as flexible. Scylla does have variable per-query consistency settings but is primarily eventually consistent. Does support BATCH statements which are atomic updates, but no Lightweight-Transactions yet to read-before-write, but does have counters now. It's not quite feature parity with Cassandra but quickly getting there.

CockroachDB also uses a key/value store but puts a postgres-compatible SQL layer on top, derived from the Google Spanner approach, so you can get (almost) all the querying abilities and data modeling of a relational database. They're slated to have JSON datatypes soon that will make it very compelling as a general purpose, highly reliable datastore for all of your core data in multiple regions.

zackelan 3115 days ago

If you want to read about the theoretical underpinnings, Cassandra is derived from the original Dynamo paper[0] from Amazon, and Scylla is a drop-in replacement for Cassandra written in C++ instead of Java. Cockroach follows more closely the Google Spanner[1] approach.

For a more practical summary, compare the architecture overviews of Cassandra[2] and Cockroach[3].

0: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp...

1: https://static.googleusercontent.com/media/research.google.c...

2: https://docs.datastax.com/en/cassandra/3.0/cassandra/archite...

3: https://www.cockroachlabs.com/docs/stable/architecture/overv...

openasocket 3115 days ago

They're pretty different. Scylla is essentially a C++ rewrite of Cassandra, while CockroachDB aims to be a true distributed SQL database. Cassandra and Scylla do not support transactions (or atomicity or strong consistency) or joins.

adrianratnapala 3115 days ago

So is this for helping with read, but not write latency?

I know little about CockroachDB, but I vaguely remember that it has consistent replication. If so, then writes would require some round trips before consensus is reached, killing any latency benefits from having the master follow the load (unless you move all your replicas around!).

a-robinson 3115 days ago

It helps with both, but is more dramatic for reads.

Reads go from 1 RTT to 0 RTTs, while writes go from 2 RTTs to 1 RTT. For example, consider a cluster in which the nodes are 200ms RTT away from each other.

* A read will take 200ms if the leaseholder isn't in the local datacenter vs single-digit milliseconds if it is local.

* A write will take 400ms if the leaseholder isn't in the local datacenter vs 200 milliseconds if it is local.

The write takes 400ms because it takes 100ms to get from the local datacenter to leaseholder, 200ms to reach consensus (the leaseholder needs to send a request to and get a response from one of the followers), and another 100ms for the response from the leaseholder to get back to the local datacenter where the request originated.

humanfromearth 3115 days ago

Really looking forward to see this in action. Is follow the workload already present in 1.1.3?

Is there a bit more detailed technical doc on how this is actually achieved ideally with some examples? When is it decided that the lease holder moves from one node to another? Is it based on some sort of stats?

Edit: found the doc follow-the-workload here: https://www.cockroachlabs.com/docs/stable/demo-follow-the-wo...

a-robinson 3115 days ago

It's actually even in 1.0.0! We just haven't written about it previously.

The doc you found is the best resource. If you want even more detail, you can check out the original RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RF...

jinqueeny 3115 days ago

Great post! Do you have benchmark to show how CRDB is compared with TiDB in terms of performance and latency? And how CRDB is compared with Amazon Aurora with PostgreSQL compatibility?

toolslive 3114 days ago

Isn't "Follow-the-workload" something that is naturally achieved when Mencius is used iso Raft?

a-robinson 3114 days ago

WAN-optimized consensus protocols like Mencius and Egalitarian Paxos indeed are better for writes because any node can commit a write in one round trip, but they don't allow for reads with zero round trips. Every read must go through the consensus machinery in order to ensure consistency, whereas in CockroachDB the leaseholder can serve reads without needing to consult other nodes.

toolslive 3113 days ago

why do reads have to go through the consensus machinery ? If they don't go through the machinery, read results will still be consistent, they just might not include the very last updates, but you cannot avoid that. Raft has the same problem (they all have, and it's unavoidable): suppose you read from the Leader (Master, whatever you call it); the leader sends you a perfect fresh consistent value; however, your network takes time to deliver it to you and meanwhile an update might have happened. It just shows that the key value store should provide atomic multi updates (or at least compare-and-swap )

cube2222 3112 days ago

Without the consensus read you can get inconsistent values.

Suppose you write value a to leader A and then value b to leader A. Node B got both updates, node C got only the a value because of network timeouts.

Now you do a read from B of both values, you get the current a and b. Then you do a read from C, you receive the current a and the old b. Inconsistent.

Basically a violation of serializability and linearizability across the cluster in respect to the client.

toolslive 3109 days ago

No. In your example, value 'a' will not have been applied to the store in C yet, as there is a hole in the log: a set of updates that contains update 'b' (and possibly others). So the view in A will be more up to date than the view in C, but both will be consistent.