Hacker News new | ask | show | jobs
by BillFinchDba 3426 days ago
I think if you look back objectively, there are very few database platforms that were absolutely "fit for public consumption" right out of the box. Look at all the SQL Server shops out there (mine included) that won't even roll out a new version of SQL Server until it hits SP 1 at a minimum... For MongoDB, If you look forward based on what they are doing now rather than at how early adopters may have had a sub-optimal experience way back when, you'll see a mature product that is consistently improving and is demonstrably reliable. Can you give an example of another option you are referring to?
4 comments

Right out of the box? Mongodb has been trying to get it right for 10 years now. Kyle says the storage engine they've used for most of that lifetime is fundamentally flawed, and they've only now, a decade on, managed to write something without known bugs to replace it. And maybe this time it's ok. Maybe this time there aren't any more layers of buggy crap in mongo yet to be found and fixed.

Maybe. But you'd have lost that bet if you made it any day in the last 10 years. And in those 10 years mongodb has demonstrated again and again that they aren't up to the task of writing a reliable database. Even with their new storage engine they couldn't find the bugs alone.

I think using mongo today for any mission critical data is an irresponsible choice. I'd seriously question the judgement of any senior engineer who picks it for a new project over rethinkdb or Postgres.

>"Kyle says the storage engine they've used for most of that lifetime is fundamentally flawed, and they've only now, a decade on, managed to write something without known bugs to replace it"

Didn't WiredTiger Inc write the new WiredTiger storage engine before they were acquired by MongoDB Inc?

https://gigaom.com/2014/12/16/mongodb-snaps-up-wiredtiger-as...

Do you think MongoDB is a good choice (given how easy it is to use) when you only care that 99.999% of your data that you insert should end up in the database? That's my use case. Best-effort integrity. I mostly just want a DB can insert and query fast for documents and am not really concerned if I lose a few documents here and there.
Why wouldn't you just use anything else that can manage to insert/read data without losing it?

I don't really understand the angle of "can I get away with it anyways, tho?"

Some of us are already using MongoDB and are not so keen on replacing it.
If you read back the discussion was scoped to "new projects". By jospehg:

> I'd seriously question the judgement of any senior engineer who picks it for a new project over rethinkdb or Postgres.

It's about making tradeoffs. If MongoDB works for you (I actually enjoy using it tremendously) then I have to ask myself am I ok with its non-perfect integrity. For my use cases this isn't a problem. I'm not working with customer data or anything where losing a few records would make any difference at all.
In my experience, mongo lets you check the end result and try inserting again.
How do you expect to check the end result? The article's Jepsen analysis shows that both the v0 and v1 replication protocols (excepting the very latest version of v1 that appears to be in response to this) can result in acknowledged writes being lost. I.e., the DB tells you, for a write sent with a majority, that the write was successful — to a majority! Subsequently (and, if I understand the article, possibly not immediately), the write can be lost.
It depends.

Given a small cluster of reliable nodes on a reliable network, these errors will occur extremely rarely. So rarely, in fact, that they'll be written off as "user error" by support.

If you're a startup building a system which has to quickly and reliably scale from 3 > 3000 nodes in a year then the whole thing is likely to explode in your face. Twitter style.

Now, if MongoDB was so superior that it was truly platform which would even enable that kind of scaling, then the decision is simple: just go for it.

The thing is, this isn't how the world works. When systems are built, very few people consider (or are capable of considering) the growth of the system. Frameworks and database are, by the rule, chosen arbitrarily. When scaling happens, the question is more "how can we scale what we have whilst having everything kind of work" than "how do we design a system which works correctly at scale".

Mongo's whole strategy is based around this. Make Mongo the default choice for the current generation of developers.

Fantastic market strategy.

Fantastic market strategy, but it's still snake oil they're selling.

When you talk about growing, the biggest value in Open Source has been that you can start with something free but shit, and then as you make money then you can spend it on customizing that Open Source in a way that benefits you.

However there exist commercial offerings that are (and were) faster and better at MongoDB than MongoDB was: KDB could've handled Twitter, we never would've seen a fail whale, and it is a whole hell of a lot cheaper than the developers and the customizers, and the headache, and the fact that you're making something open source which ultimately benefits your competition.

Another way to think about it is by thinking about experts: If you've got a great startup idea, why would you want to make your odds 10% worse by introducing the possibility it'll fail, by using the cheapest hacky hack thing that has 10% chance of losing your data? Ask experts with data, and be honest with your budget and you'll do a lot better.

Well, thanks for the question!

You check the result with getLastError which, as you described, can be used to ensure a majority agrees with the write. But you normally don't use getLastError that way. Because a majority might not even be concerned with that particular write. They are, after all, shards. Instead you check if primary got the write. If primary disconnects while you are checking, you catch the exception and try checking until a new primary is decided. And if your check result is not ok, you try inserting again. That's as reliable as it gets when inserting to any database including SQL databases that support transactions.

You describe it like it is simple but that is ridiculous number of steps to simply check your data was actually written to the database.

>that's as reliable as it gets when inserting into any database including SQL

The difference being in a SQL database you call commit and all this happens for you automatically

> I'd seriously question the judgement of any senior engineer who picks it for a new project over rethinkdb or Postgres.

... you mean RethinkDB, whose future is still uncertain? Regardless of technical merits, the currently unstable future of RethinkDB means a senior engineer should be extremely cautious about selecting it for a significant project.

To be fair, choosing a scalable database even for a senior engineer still requires quite a bit of very specialized knowledge in distributed systems that most simply don't have. So they have to rely on what "feels right" anyway, rather than making an engineering decision, and are very susceptible to all the marketing and PR and authoritative opinions. There are no right choices for them. Although if in doubt everyone should probably default to a dynamo-style db, as it forces you to think about and organize your data in a certain future-proof way, which actually excludes all of the mentioned databases.
Not the OP, but RethinkDB is superior in many ways including stability, integrity and the feature set for pretty much every use case you would consider using MongoDB.

But with Jespen tests MongoDB can finally be considered a contender. Its not like competent teams were using it in production. Right?

Do you not consider Stripe to be a competent team?

Please, name some F500 companies using RethinkDB to power critical infrastructure. There are many using MongoDB. While Rethink is widely renowned among the HN set it is nonexistent in comparison when looking at actual deployments.

The reigning HN view of MongoDB being a buggy mess is outdated. Yes, they overmarketed a buggy project in 2009. It didn't matter, because they built a product that developers loved (and continue to love) to use. RethinkDB didn't aggressively market itself, and look where it is now - defunct. Mongo used that momentum to raise money and hire an incredible engineering team, including Keith Bostic, one of the fathers of Unix, and Michael Cahill, the inventor of the transaction isolation mechanism used in Postgres. Sometimes you need to employ aggressive business tactics to get to a point where you have the engineering resources to build a world class project. Moreso when you need to catch up to millions of man hours spent building Oracle and MSSQL.

ah, the "x uses y so it must be good" fallacy

I should note that I work for a multi-national gaming company and we use software that is ABSOLUTELY not fit for purpose, but once you have a hard dependency on something and the cost of muddling through is _less_ than the cost of a rewrite then you're going to be stuck supporting it.

This is the reality of technology in enterprise.

That specific point was in reply to the GP's statement that "competent teams" weren't using it in production.
I don't think it should be taken as given that there's a correlation between competency and the size of an organization that a team exists within, and I don't think such a correlation, when combined with large organizations' usage of MongoDB would challenge the assertion that there exists an anticorrelation between team competency and use of MongoDB.

Looking at the numbers, larger organizations straight-forwardly seem like they should be more likely to eventually hire mediocre talent, survive despite having done so, and more likely to have adopted any given tool.

I think you're looking at it wrong. It's not a popularity contest; I've seen billion dollar companies use fucking stupid tooling as well, but they still have the right processes where they don't lose data.

Mongo, on the other hand, loses data.

You speak of "correlation" and "looking at the numbers" but provide no data. What exactly is your point?
as an aside, Stripe are putting money in to support RethinkDB
> Its not like competent teams were using it in production. Right?

I've heard that about 1000 times a day for 6 years. Usually the person stating it is snickering as if they are clued into some unknown secret.

Mongo does work in production at many shops, and in many forms. Sometimes it's used as the main database, sometimes it's used to house specific slices of data, etc.

As a sysadmin, I got tired of the devs constantly ragging on Mongodb (the same folks that selected it before I was hired). I eventually got fed up and said: "why do we use it if you all hate it so much. Let's replace it. What do you want to use instead, it's easy for me to set up something new". Cue everyone going "ah, it's not so bad, really..."

MongoDB is the Nickelback of databases: a reasonable act that's not going to blow your socks off, but one where saying "OMG I hate it!" somehow signals membership to some cool clique of connoisseurs.

alternative anecdota : I rewrote a backend that was using mongo, moved it to postgresql+postgis. Solved an ever-expanding RAM issue and is still blindingly fast [ on SSD hosting ]

The main win was not server stability, it was having general tools to manage data .. including the inbuilt geo-algorithms that come with postGIS. eg. I could make our data set 9x smaller by smoothing map paths.

I still love the Mongo api, but I just cant risk it with data on projects that people are paying for, or that I need to support.

[ I think the real sweet spot will be deep integration of javascript and json into postgres - so I can write stored procs in js, get db events in js, wrangle json fluidly.. all of which is improving. ]

One day postgres will be everywhere :)
CouchDB, RethinkDB, PostgreSQL.

>OMG I hate it!" somehow signals membership to some cool clique of connoisseurs

You do realize you are the one dragging identity into the mix.

I dont understand how this is a big ego debate. MongoDB isnt that relevant and it isnt Nickelback, its a homeopathetic database. That is: if you use it for something else than caching (store data you cant afford to loose, or load balance and use the database as main mutex to deal with all concurrency issues) that would arguably be a very irresponsible choice.

Hell i hate most databases, because its hard to get right yet some have interesting trade-offs (ElasticSearch, Cassandra, CockroachDB).

And its not a subjective or even analog discussion where databases are more or less consistent or more or less durable. They fsync or they dont. They use raft with majority consensus or they dont.

If you as a sysadmin judge these emperical facts based on your prejudices about the sort of people that would agree or disagree with you than you are much more like the cool clique of connoisseurs than the people at the other end of your finger.

Its engineering, not wine tasting. The shape of the world isnt a subjective thing anymore than the durability of a database that doesnt fsync.

Well, I'm a member of the MongoDB is worth quitting over club. I replaced it at a former employer with postgres. Eng waved bye to an endless stream of operational issues, and customers saw better uptimes and much much faster responses.
When was that and which version of mongodb? Do you suppose things have improved after 3.0 etc.
Lots of hyperbole in your statement. Indeed lists 6200 job listings for mongodb[1], 37 for rethinkdb[2].

[1] https://www.indeed.com/jobs?q=mongodb&l= [2] https://www.indeed.com/jobs?q=rethinkdb&l=

Rethinkdb is still the new database on the block, and never really found it's feet. Look up Postgres, Cassandra, Kafka, Riak, MySQL/Mariadb or MSSQL. (Or dare I say it, oracle). All of those tools have a long history of reliability and solid engineering.
Ha! MySQL having a long history of reliability... Son have you heard of MyISAM?
In many ways MySQL is similar to MongoDB.

Both started out being written by people who know nothing about databases and both threw away years of database research.

Both gained popularity due to being accepted choice by web-based languages (PHP vs NodeJS)

Both were faster than more established competition, only to turn out that both were losing data.

Both turned out to be designed fundamentally wrong and had a replacement engines that are more reliable (ISAM/MyISAM vs InnoDB and v0 vs v1).

Both still have quirks due to bad decisions in the past, but which can't be easilly fixed due to breaking compatibility.

You're comparing ISAM/MySAM (storage engine) to the MongoDB replication protocol. As a more relevant parallel MongoDB also replaced its original storage engine with one acquired from WiredTiger (BerkeleyDB founders).

One big difference from a corporate strategy perspective is that MySQL let the replacement storage engine (InnoDB) fall in to the hands of Oracle. MongoDB was smart enough to make sure that they were the acquirer, which puts them in control of their own destiny.

If MongoDB is heading along the path of MySQL, that's a pretty good path to be on considering that MySQL is used as the store of record at Facebook, Twitter and some parts of Google.

Ah, early 2000s and the table level lock during inserts but lightning fast reads. Postgres was still a fledgling back then.
> Can you give an example of another option you are referring to?

That depends on the data.

What type of data you have and what you want to do with it.

MongoDB isn't data specific and it claim to fame is flexible data structure.

If you want fast write and look up with very little relation Cassandra is good.

If you want searchable text document then anything that is base on Lucene is good (ES, Solr, Raven).

If you want time series there are few out there but it's a niche.

Likewise if you want graph data then there are NodeJS, Titan, etc..

MongoDB at most company I worked with was use because they don't think about what type of data it is and what performance they want. They want to store unstructure data cause it's easy.

I personally think it's a cop out, especially as a statistician/programmer.

> if you want graph data then there are NodeJS

I think you meant Neo4j.

For me, mongodb was like a document-store on tmpfs.

And, I can't sell that to people I work with.

What if I want filtering by several criteria (on a table with 1k columns) and simple aggregations, but I don't need full-text search* ? I'm still looking :(

* I only want starts-with and contains on strings.

SQL Server and Oracle are hardly comparable. The issues you see there are more to do with backwards compatibility or performance impact of big new features than they are with the core stability.