Hacker News new | ask | show | jobs
by Vishnevskiy 3444 days ago
Let me take a stab at that.

Riak is not a good model since its more a blob store and we wanted to simply range scan through messages rather than sharding blobs (Cassandra is REALLY good at this).

HBase would have been fine for this model, but the open source version of HBase has much lower adoption than Cassandra so that was a big factor. We also don't care about consistency and HBase is a CP database, we prefer AP for this use case. As far as using GCP's BigTable (HBase compat), we made this decision before we moved to GCP, but we are also not fans of using platform lock-in. While BigTable has the same API as HBase we would hate to go to an less widely adopted version where we have a hard time getting community support if we decided to leave GCP.

Hope that helps.

4 comments

> As far as using GCP's BigTable (HBase compat), we made this decision before we moved to GCP, but we are also not fans of using platform lock-in.

Did you consider GCP Datastore as well?

It has strong consistency for a single "entity group", but eventual consistency for queries on multiple entity groups.

So by storing data only relevant to a single user in an entity group, you can have strongly consistent, atomic transactions on that group (albeit limited to 1 tx/s), and at the same time do global queries on all user data with eventual consistency.

The pricing model does not fit our needs, and that is even more locked in than the BigTable variant.
I'm happy to hear you dropped it for non-technical reasons, since I'm asking because I've chosen Datastore for an app because I care less about vendor lock-in than ease of operation, and it fits my pricing model perfectly, due to the app in question receiving (Bitcoin) payments that are charged a fee on a per-request/payment basis.

Hint: if you have technical reasons for avoiding GCP Datastore I'd be very interested in hearing about them

Google Cloud is the least geo-distributed provider around. Which is a major problem if your use case has requirements around (a) latency and (b) data locality due to legal requirements.

In 2017 they will finally have datacenters in Sydney, London, Singapore, Frankfurt etc.

This is one area where Azure is leading with both Azure SQL and DocumentDB supporting geo-replication.
nope since azure is extremly expensive and also you need several accounts for different regions. i.e. you can't create servers in germany with a whole new account / credits / support.
> Riak is not a good model since its more a blob store and we wanted to simply range scan through messages rather than sharding blobs (Cassandra is REALLY good at this).

Can you tell a little bit more please? Range scan is done by using secondary indexes (index by timestamp) in our system. I'm not sure I understood the part about blobs or some things specific to Cassandra. Reply is highly appreciated.

Cassandra uses consistent hashing. A segment of data that is addressed by a key is called partition, found by the partition key. Partitions can contain just 1 "row" if you only use a single column as the key, or you can create a compound key with a part dedicated to finding the partition and the rest to finding several rows within that partition.

If you use a compound keys (multiple rows), these rows are all stored in the same partition (which all lives on the single node which owns or replicates that partition in the consistent hash ring), so scanning those rows is very fast and efficient.

Did you consider Scylladb (http://www.scylladb.com), a Cassandra-compatible DB written in C++ by the guys behind the KVM?
> the open source version of HBase has much lower adoption than Cassandra so that was a big factor

Is this due to the availability of experienced developers or another factor?

PostgreSQL has a lower adoption rate than MySQL, but we chose it due to its suitability to the tasks at hand. As long as the adoption rate is not low enough to give concern about the longevity of a tool, I'm less concerned about it than other factors.

Well, relatively speaking Postgre might have lower adoption than MySQL, although I am not too sure about it. However, if you look at the absolute numbers, Postgre has huge adoption, even if it is smaller than MySQL's. So it doesn't really matter as chances are, you will able to find an experienced developer. Can't say the same about HBase, etc since there are significantly fewer projects requiring it compared to MySQL/Postgre.