Hacker News new | ask | show | jobs
by nurettin 3426 days ago
In my experience, mongo lets you check the end result and try inserting again.
1 comments

How do you expect to check the end result? The article's Jepsen analysis shows that both the v0 and v1 replication protocols (excepting the very latest version of v1 that appears to be in response to this) can result in acknowledged writes being lost. I.e., the DB tells you, for a write sent with a majority, that the write was successful — to a majority! Subsequently (and, if I understand the article, possibly not immediately), the write can be lost.
It depends.

Given a small cluster of reliable nodes on a reliable network, these errors will occur extremely rarely. So rarely, in fact, that they'll be written off as "user error" by support.

If you're a startup building a system which has to quickly and reliably scale from 3 > 3000 nodes in a year then the whole thing is likely to explode in your face. Twitter style.

Now, if MongoDB was so superior that it was truly platform which would even enable that kind of scaling, then the decision is simple: just go for it.

The thing is, this isn't how the world works. When systems are built, very few people consider (or are capable of considering) the growth of the system. Frameworks and database are, by the rule, chosen arbitrarily. When scaling happens, the question is more "how can we scale what we have whilst having everything kind of work" than "how do we design a system which works correctly at scale".

Mongo's whole strategy is based around this. Make Mongo the default choice for the current generation of developers.

Fantastic market strategy.

Fantastic market strategy, but it's still snake oil they're selling.

When you talk about growing, the biggest value in Open Source has been that you can start with something free but shit, and then as you make money then you can spend it on customizing that Open Source in a way that benefits you.

However there exist commercial offerings that are (and were) faster and better at MongoDB than MongoDB was: KDB could've handled Twitter, we never would've seen a fail whale, and it is a whole hell of a lot cheaper than the developers and the customizers, and the headache, and the fact that you're making something open source which ultimately benefits your competition.

Another way to think about it is by thinking about experts: If you've got a great startup idea, why would you want to make your odds 10% worse by introducing the possibility it'll fail, by using the cheapest hacky hack thing that has 10% chance of losing your data? Ask experts with data, and be honest with your budget and you'll do a lot better.

I have some actual experience with KDB and MongoDB so I'm going to have to call bullshit.

How does KDB handle replication and failover? Or even high insert/update rates to datasets that exceed the size of memory? How do you shard KDB?

KDB doesn't support unicode text. Do you plan to only have English speaking users?

Yes, KDB excels at its relatively well defined niche of transforming and aggregating "smallish" (say 10 TB or less) numerical time series data. It would be a horrible choice for the backing store of a high throughput CRUD application...

What is it with KDB zealots thinking that KDB is the best database for every task? I swear, KDB is the Scientology of databases.

> How does KDB handle replication and failover?

With -r and (in my case) SO_REUSEPORT. Most people use a dedicated gateway (have seen custom tomcat stuff and haproxy).

Meanwhile, MongoDB doesn't actually replicate reliably (acking then losing anyway) and failover can crash cascade in the naïve configuration.

> Or even high insert/update rates to datasets that exceed the size of memory?

This is literally the KDB tickerplant model. Have an RDB that flushes out regularly (daily) to an HDB.

You can also just write to a log `:log upsert ...

> How do you shard KDB?

Same way you shard anything else? By picking a key and directing the query to the appropriate server. h[(first md5 k) mod count h] "query..."

> KDB doesn't support unicode text.

UTF8 is fine.

The number of times I've needed the first 5 code points (and not the first 5 bytes or the first 5 characters) in my life is zero. All that half-baked Unicode support in various languages (like MongoDB) just makes people think that they've solved a problem that they really haven't.

> Yes, KDB excels at its relatively well defined niche of transforming and aggregating "smallish" (say 10 TB or less) numerical time series data. It would be a horrible choice for the backing store of a high throughput CRUD application...

I use it in one of those big CRUD databases (digital marketing and tele-lead tracking).

> What is it with KDB zealots thinking that KDB is the best database for every task? I swear, KDB is the Scientology of databases.

Because it solves problems they have.

Even when I don't use KDB I use a similar architecture because it's the correct architecture, because I've had these problems for a lot longer than I've had KDB.

If it doesn't solve every problem I have, that's because I have work to do, not because it isn't great at the problems it does solve, and I don't shout at my hammer because it isn't a spoon.

However MongoDB doesn't solve any problem I've ever had: I've never needed a bag of objects/filesytem that loses data, or a binary blob that I cannot query. It's so famously "web scalable" it has made a joke of the very idea of being scalable.

> With -r and (in my case) SO_REUSEPORT. Most people use a dedicated gateway (have seen custom tomcat stuff and haproxy).

So as a KDB user you need to implement your own HA solution. That is strictly worse than MongoDB replication, even with its now-fixed bugs. Do you really think your homemade multi-master KDB system would pass Jepsen?

> This is literally the KDB tickerplant model. Have an RDB that flushes out regularly (daily) to an HDB.

Wat? That only works if data is immutable once written. Tweets are liked/deleted/etc. You could store an immutable log of user actions, but then you would have to reconstruct the current snapshot every time someone loads a timeline. It's entirely possible for someone to like/delete/RT an old tweet. Financial data is naturally partitioned because the order book clears at the end of every trading day - this doesn't apply to CRUD apps.

> UTF8 is fine

I think you misunderstand what I mean by unicode support. Does KDB support locale specific collations? Does it support normalization/canonicalization? Being able to index by code point is about 1% of the needed solution to build an i18n-proof product. Obviously that doesn't matter when you are dealing with normal KDB datasets like market data where e.g. asian names are represented with numbers.

> I use it in one of those big CRUD databases (digital marketing and tele-lead tracking).

Were you using it to store clickstream data? Or some other kind of immutable stream of events? That isn't really applicable to general CRUD applications.

Like I said - KDB is great for analyzing immutable streams of events. It's not a general purpose database for building CRUD applications. MongoDB tries to be a reasonable enough solution for many use cases, while KDB focuses on excelling at a small number. Both are valid approaches to building a database...

>KDB doesn't support unicode text.

Unicode (from 2011):

http://code.kx.com/wiki/Cookbook/Unicode

I replied to a sibling with more details. Indexing by code point is only the smallest (and easiest to solve) part of the problem of dealing with non ASCII text.
Well, thanks for the question!

You check the result with getLastError which, as you described, can be used to ensure a majority agrees with the write. But you normally don't use getLastError that way. Because a majority might not even be concerned with that particular write. They are, after all, shards. Instead you check if primary got the write. If primary disconnects while you are checking, you catch the exception and try checking until a new primary is decided. And if your check result is not ok, you try inserting again. That's as reliable as it gets when inserting to any database including SQL databases that support transactions.

You describe it like it is simple but that is ridiculous number of steps to simply check your data was actually written to the database.

>that's as reliable as it gets when inserting into any database including SQL

The difference being in a SQL database you call commit and all this happens for you automatically

>>You describe it like it is simple

ah, no. I did not.