Hacker News new | ask | show | jobs
by vegabook 3422 days ago
Honestly what needs to happen next is a serious effort to explain why or when rethink is better than mongo, cassandra, arango, aerospike, memsql, mysql, riak, or postgres, ++, not to mention all the TSDBs. On the event pushes I am unconvinced that message queues/computation graphs arent superior and that's another crowded space. When I last looked at it the advantages struck me as mostly incremental on the query language and decremental on performance. There are many excellent competitors in this space, most of which are well funded, and moving targets. Rethink doesn't seem to have a USP, or none that has been effectively communicated at least, IMO.
2 comments

TBH, given the option... RethinkDB is probably the best case for anything that needs distribution/HA and automatic failover. SQL is a decent option, but you either pay a lot for HA, or you need to have a lot of domain knowledge or hire dedicated DBA support. Not that RethinkDB doesn't need some knowledge, their admin interface is great.

The replication model is similar to Cassandra (ring + redundancy), while the master/slave model and failover has had a lot of work to make it bulletproof.

It will scale well from 3-15 nodes, then it starts to drop off as less than linear growth. But if you need more than that, then you're in a whole other league.

If you want search only, go for ElasticSearch. If you need much greater linear growth at the cost of application complexity, Cassandra. If you need fast memory access, then go for Redis. If you don't need bullet-proof automagic failover, or are willing to pay through the nose for it, go for SQL. If you are okay with a single system, go SQL. Otherwise, RethinkDB should probably be the first choice.

Don't get me wrong, I'll reach for SQL first in many cases... but RethinkDB if I have a choice and HA is a requirement. I also happen to prefer a document-centric model/approach.

Aerospike touches most of your points at much higher speed and scale. It is next gen redis, basically, with disk, with auto-sharding scale, with cross node queries. Cassandra is not difficult once you wrap your mind around column storage, and if you need that, no other storage style will do.

The 15-node thing is also a major achilles heel. Who wants to commit to a stack that incurs massive technical debt in the event of massive success? Imagine reengineering your db and your event pushes, at scale...

Aerospike doesn't offer many consistency guarantees. If you run it in a cluster on the cloud you are more than likely to see silent data loss [1].

It's not a fair comparison, RethinkDB is much safer. I'm sure, if you turn down the defaults on both read and write operations on RethinkDB you could scale it well past 15 nodes and with very high read and write throughput.

1: https://aphyr.com/posts/324-jepsen-aerospike

You can scale past 15 nodes... it's just you'll want to tweak things and/or you won't get linear growth as you add more nodes. That doesn't mean you can't. Also, if you need more than that, Cassandra and other options are there, and you'll likely have to feel that pain regardless.

There are other ways to separate your data depending on use cases. It's just a rough guideline... You'll see similar issues beyond 10-20 servers in a local cluster in many of the NoSQL options.

RethinkDB also has much better consistency guarantees over Aerospike, not to mention being FLOSS under a more permissive license.

RethinkDB pushes well past 15 nodes: teams have demonstrated north of 25-30 nodes with linear scale.
Thank you... iirc, the recommendation was 12-15 nodes at the top end. Though I haven't investigated deeply for a while now, as for the past 2 years I haven't had the option of what I've been using.
Comes down to jenkins for me. RethinkDB aced the jenkins tests.

Mongo has failed every jenkins test it's been put through, dunno about the status now though. Last I checked Mongo's default durability level was "data loss on power outage". Aerospike failed jenkins too, and not on small edge cases like Mongo, but with major dataloss.

Going by the problems Gitlab has recently with Postgres I wouldn't use that for a distributed database. Likely true for MySQL too.

* Yep. I meant to write Jepsen :)

I think you mean Jepsen? It's a slight exaggeration to say that RethinkDB aced it -- but they did very well[1] and (more importantly to me, honestly) Jepsen was used to find a subtle and nasty issue that was subsequently fixed.[2]

[1] https://aphyr.com/posts/329-jepsen-rethinkdb-2-1-5

[2] https://aphyr.com/posts/330-jepsen-rethinkdb-2-2-3-reconfigu...

I've been having a lot of trouble with Jenkins at work today. So when I though Jepsen, I wrote Jenkins.

Still feel they aced it. No one passes Jepsen on their first try. But RethinkDB is the first to immediately fix the issue.

I wouldn't go as far as to call it a nasty issue. It would only happen if you got node failures while reconfiguring your cluster. And reconfiguring the cluster must be initiated by the admin and it's something that happens very often.

Sorry, clarification: by "nasty" I just meant "subtle", not "debilitating" -- and agreed that they did really well on Jepsen.
Passt find replace Jenkins Jepsen call-me-maybe ;)
Jepsen