Hacker News new | ask | show | jobs
by laichzeit0 3426 days ago
Do you think MongoDB is a good choice (given how easy it is to use) when you only care that 99.999% of your data that you insert should end up in the database? That's my use case. Best-effort integrity. I mostly just want a DB can insert and query fast for documents and am not really concerned if I lose a few documents here and there.
2 comments

Why wouldn't you just use anything else that can manage to insert/read data without losing it?

I don't really understand the angle of "can I get away with it anyways, tho?"

Some of us are already using MongoDB and are not so keen on replacing it.
If you read back the discussion was scoped to "new projects". By jospehg:

> I'd seriously question the judgement of any senior engineer who picks it for a new project over rethinkdb or Postgres.

It's about making tradeoffs. If MongoDB works for you (I actually enjoy using it tremendously) then I have to ask myself am I ok with its non-perfect integrity. For my use cases this isn't a problem. I'm not working with customer data or anything where losing a few records would make any difference at all.
In my experience, mongo lets you check the end result and try inserting again.
How do you expect to check the end result? The article's Jepsen analysis shows that both the v0 and v1 replication protocols (excepting the very latest version of v1 that appears to be in response to this) can result in acknowledged writes being lost. I.e., the DB tells you, for a write sent with a majority, that the write was successful — to a majority! Subsequently (and, if I understand the article, possibly not immediately), the write can be lost.
It depends.

Given a small cluster of reliable nodes on a reliable network, these errors will occur extremely rarely. So rarely, in fact, that they'll be written off as "user error" by support.

If you're a startup building a system which has to quickly and reliably scale from 3 > 3000 nodes in a year then the whole thing is likely to explode in your face. Twitter style.

Now, if MongoDB was so superior that it was truly platform which would even enable that kind of scaling, then the decision is simple: just go for it.

The thing is, this isn't how the world works. When systems are built, very few people consider (or are capable of considering) the growth of the system. Frameworks and database are, by the rule, chosen arbitrarily. When scaling happens, the question is more "how can we scale what we have whilst having everything kind of work" than "how do we design a system which works correctly at scale".

Mongo's whole strategy is based around this. Make Mongo the default choice for the current generation of developers.

Fantastic market strategy.

Fantastic market strategy, but it's still snake oil they're selling.

When you talk about growing, the biggest value in Open Source has been that you can start with something free but shit, and then as you make money then you can spend it on customizing that Open Source in a way that benefits you.

However there exist commercial offerings that are (and were) faster and better at MongoDB than MongoDB was: KDB could've handled Twitter, we never would've seen a fail whale, and it is a whole hell of a lot cheaper than the developers and the customizers, and the headache, and the fact that you're making something open source which ultimately benefits your competition.

Another way to think about it is by thinking about experts: If you've got a great startup idea, why would you want to make your odds 10% worse by introducing the possibility it'll fail, by using the cheapest hacky hack thing that has 10% chance of losing your data? Ask experts with data, and be honest with your budget and you'll do a lot better.

I have some actual experience with KDB and MongoDB so I'm going to have to call bullshit.

How does KDB handle replication and failover? Or even high insert/update rates to datasets that exceed the size of memory? How do you shard KDB?

KDB doesn't support unicode text. Do you plan to only have English speaking users?

Yes, KDB excels at its relatively well defined niche of transforming and aggregating "smallish" (say 10 TB or less) numerical time series data. It would be a horrible choice for the backing store of a high throughput CRUD application...

What is it with KDB zealots thinking that KDB is the best database for every task? I swear, KDB is the Scientology of databases.

> How does KDB handle replication and failover?

With -r and (in my case) SO_REUSEPORT. Most people use a dedicated gateway (have seen custom tomcat stuff and haproxy).

Meanwhile, MongoDB doesn't actually replicate reliably (acking then losing anyway) and failover can crash cascade in the naïve configuration.

> Or even high insert/update rates to datasets that exceed the size of memory?

This is literally the KDB tickerplant model. Have an RDB that flushes out regularly (daily) to an HDB.

You can also just write to a log `:log upsert ...

> How do you shard KDB?

Same way you shard anything else? By picking a key and directing the query to the appropriate server. h[(first md5 k) mod count h] "query..."

> KDB doesn't support unicode text.

UTF8 is fine.

The number of times I've needed the first 5 code points (and not the first 5 bytes or the first 5 characters) in my life is zero. All that half-baked Unicode support in various languages (like MongoDB) just makes people think that they've solved a problem that they really haven't.

> Yes, KDB excels at its relatively well defined niche of transforming and aggregating "smallish" (say 10 TB or less) numerical time series data. It would be a horrible choice for the backing store of a high throughput CRUD application...

I use it in one of those big CRUD databases (digital marketing and tele-lead tracking).

> What is it with KDB zealots thinking that KDB is the best database for every task? I swear, KDB is the Scientology of databases.

Because it solves problems they have.

Even when I don't use KDB I use a similar architecture because it's the correct architecture, because I've had these problems for a lot longer than I've had KDB.

If it doesn't solve every problem I have, that's because I have work to do, not because it isn't great at the problems it does solve, and I don't shout at my hammer because it isn't a spoon.

However MongoDB doesn't solve any problem I've ever had: I've never needed a bag of objects/filesytem that loses data, or a binary blob that I cannot query. It's so famously "web scalable" it has made a joke of the very idea of being scalable.

>KDB doesn't support unicode text.

Unicode (from 2011):

http://code.kx.com/wiki/Cookbook/Unicode

Well, thanks for the question!

You check the result with getLastError which, as you described, can be used to ensure a majority agrees with the write. But you normally don't use getLastError that way. Because a majority might not even be concerned with that particular write. They are, after all, shards. Instead you check if primary got the write. If primary disconnects while you are checking, you catch the exception and try checking until a new primary is decided. And if your check result is not ok, you try inserting again. That's as reliable as it gets when inserting to any database including SQL databases that support transactions.

You describe it like it is simple but that is ridiculous number of steps to simply check your data was actually written to the database.

>that's as reliable as it gets when inserting into any database including SQL

The difference being in a SQL database you call commit and all this happens for you automatically

>>You describe it like it is simple

ah, no. I did not.