Hacker News new | ask | show | jobs
by davismwfl 4595 days ago
TLDR; Couchbase is cool but has some considerations so evaluate, Cassandra is better spoken to by someone with experience there and PostgreSQL is still great if you have a relational dataset -- start with what you know if it works and go from there. ---

There is a huge number of factors that go into it, but I'll give you some opinion. :)

If you know PostgreSQL and how to work with it today (and the others are new to you), stay with it for now until you know it isn't the right tool. Trying to learn a new methodology and tool while also starting a company and trying to gain traction etc isn't always the best way to go. Also, think about when/if you need to hire, how long will it take to find someone with experience in X tool or to train someone on it.

As for Couchbase vs Cassandra vs PostgreSQL. All have their pro's and con's and it will boil down to your use cases, dataset and complete tech stack (i.e. some SDK's are less mature than others)

I have been a huge Couchbase fan and user for a few years now, going back to membase. However, I'll be honest, while our current primary datastore is Couchbase, we are moving away from it because of the amount of time we spend solving issues that just shouldn't be. To get this out of the way, I love CB's scale out ability and performance, it is stupid simple overall and works very well -- Mongo could learn a few things about making the scale out process easier from Couchbase (and I think they are). We also use Couchbase to ElasticSearch, and it works pretty damn well, but again is still maturing. In our recent evaluations we found we can replace ES for 60-70% of why we have to use it simply by moving off Couchbase. That means I can reduce my ES resources, to the 30-40% of use cases where it is needed and save some cash, while still getting the same results and performance.

There are a number of things to consider when using CB as your datastore, and while we are moving away from it, I think it is worth a solid look. However, if you store a lot of documents that are small in size but you want keyed for near instant access, Couchbase can cause you to need far more machine resources than you really should (e.g. it gets expensive fast). This is because every key + meta data (56 bytes for 2.2 I believe) must be stored in your bucket RAM, and once the key+meta-data exceeds 50-60% of the available, your in trouble in a few ways. So if you define the bucket to be 2gb, every key+meta data must fit within roughly 50% of that (1gb). Of course, you can keep scaling up/out to increase that size, but like I said costs start to become a factor here. A fair rebuttal to that is to restructure the data so it is larger values, smaller number of keys. However, now you run into a second issue, while views are awesome we have seen they have quite a way to go to be truly a final solution, and they have diminishing returns if you have too many of them. So then the typical answer is you start merging views and returning larger data sets and doing more and more work on the Couchbase client side (API etc) to filter results. Not saying that is always bad, just something to consider. Couchbase also limits you to no more than 10 buckets per cluster (and in my experience more than 5 and your CPU utilization goes up pretty well, so you need more CPU generally). Which means if you need document segmentation, that is more than just a "type" field on a document, this can quickly become an issue. Lastly, all of our API's are in node.js, and frankly CB's node library has a way to go before it is really ready to work in a high transaction way. We have found that it leaks memory when you have sustained high transaction volumes (this is with node 0.10.22), so we have reverted to writing a lot of larger tasks directly in C to get around it; while I actually enjoy doing that, it is time-consuming and not an efficient use of our bootstrapped resources. I read a lot of what the CB team is doing and I think they are working hard to fix almost every one of my points, so just weigh your entire stack first. And please don't consider this a bash against CB, it is anything but, as I think their technology is pretty damn cool, it just has to fit your use case properly like any technology.

As for Cassandra, I am no where near an expert or even a good novice here, so someone else can give you the good/bad there. I do know from reading that it has grown in favor quite a bit and the redundancy and reliability are quite good. We just evaluated it and felt it would be a good solution, however we had a hard time fitting our use case into it. I fully admit that may be our own limitations more than Cassandra's.

PostgreSQL is great, especially if you have the need for highly relational data. In general, I still would favor an RDBMS if your dataset is highly relational. So this depends more on what your data looks like and how it gets used. Performance is good when designed right, but hard to reach the performance of Couchbase, although everything has a trade off. If I needed the performance in places but my data was highly relational, I might look at using Couchbase in front of the RDBMS as a persistent cache, this makes recovery easier on the DB when there is a fault.

In the end its still all about your use case, dataset, tech stack and what you need it to do.

1 comments

Thanks for great explanation of couchbases disadvantages :), I have read a lot about couchbase and how it is good, but real production is not an ideal, so thanks.
Anytime. Good luck with everything.