Hacker News new | ask | show | jobs
by dmgbrn 4076 days ago
MongoDB has just done so much to erode my trust in novel databases. My knee jerk reaction is always "NOPE stickin with Postgres!". So I'm going to hold off on checking this one out, even though it seems from the comments that it's avoided many of Mongo's horrible design flaws.

Just my 2 cents of Mongo hate :-)

8 comments

You are absolutely right. However bear in mind this thinking is a slippery slope. Meaning most good CIOs, architects, technical leads, etc shy away from betting the house on new or novel technology precisely because experience had shown them first hand the risks of jumping on the shiny new exciting well marketed technology. At a certain point however you will find that this pushes you behind the technology curve...

the engineering skill here is the ability to trade off risk vs benefit... I will tell you from my own experience the best designed software systems I have personally dealt with tend to use components somewhat behind the curve.

Well you might be want to tell Facebook, Twitter, Netflix, Yahoo, Spotify, eBay etc that they don't know how to design software systems. Because all of them have a long history (check their Githubs) of creating and adopting pretty cutting edge technologies.

For me the best software systems are those that are well architected and use the best available technology. This doesn't mean we all should be writing Tomcat, Oracle, Apache stacks just because they are less shiny.

Yes, please do look at them, because Google, Facebook, Twitter, Yahoo, and eBay use MySQL pretty heavily for many of their core storage needs (look at the contributor list for WebScaleSQL). Spotify uses Postgres for the same. All of them also use other things as well, but only for cases where they are willing to make performance or reliability trade-offs like for colder or analytical data. Of course they also use things like Sherpa, Cassandra, HBase, etc... but they lose consistent low latency or consistency or availability when they do so.

The point is, if you are going to bet your business on a technology, it helps if it has been tested with production workloads in many different conditions and scales. You want to know about as many shortcomings as you can. For many of the use cases that people use things like Cassandra for, they will be tolerant of 30ms++ reads and potential read inversions. Redis is used pretty heavily, but it is relatively simple code and you can trace through the entire writepath pretty easily and get a sense of its limitations (being single threaded is a blessing and a curse, you really need to be careful about bad tenants because a single slow query will cause an availability event for everyone). HBase is used in a few places, but usually only for cold data after they expect it to be read only occasionally and they don't want to use up space on their MySQL pci flash devices for it anymore. There are a bunch more, but they all have some latency, consistency, or availability downsides compared to a traditional sharded B+ tree backed transactional store.

All the big guys do have a history of creating new/novel infrastructure pieces. They create them because they don't (can't really) trust any brand new infrastructure software they they didn't have at least a major hand in creating. You'll notice that if they use a new thing, it's after extensive testing and patching and contributions.

As a small startup, you might not have the time for extensive testing and patching of new hot technologies, which facebook, twitter, netflix do.

For the big guys it's not about trust. They just hit the limits of the current tried and true before anyone else does and as a result they have to forge new ground.

If you don't run at the same scale as those guys you won't hit those same limits. But if you do reach the scale of those guy's you will find that your needs are suddenly very much a unique snowflake that will require you either creating something new or heavily tweaking something that already exists.

Plenty of time for that when you reach the scale that justifies it though.

This is simply not true. Many companies in the world (including on this list) do use brand new software that they had no hand in creating. Classic example is in the Big Data space. There are plenty of very early adopters to most of the Hadoop stack e.g. Spark. Or look how many companies started using Nginx or Go even though less shiny solutions have existed.

And not sure if you've worked for a large company but they largely comprise lots of little startups sized teams. The same principles apply regardless e.g. spiking technologies out, managing risk etc.

What I have an issue with is these stupid generalisations. Less shiny = good, Shiny = bad. The merits of the architecture and technology seemed to be completely ignored.

The common rule of thumb seems to be: Less shiny = battle tested (hey, if it has bullet holes even better), Shiny = not gone even through pre-flight testing.
On the flip side. Less shiny = more accumulated technical debt.

Which then translates to buggier software as you add more and more new features.

There is a reason we rewrite codebases, no ?

It looks to me like they use bleeding edge solutions or develop their own when "standard" tech isn't doing the job well enough. And they start slow and use it in non-critical pieces of software first.
Couchbase is well established and reliable, has a track record going back to memcached and couchdb. And they're moving very fast with good features (but the way they do it their paid enterprise edition has the new features while the community edition lags a bit, which is fine because this means the community edition is really damn reliable and high quality.)
It is also really heavy weight. I run RethinkDB on a system 1/3 the size of the minimal Couchbase system and still get excellent performance. The system requirements for Couchbase made it a turn-off for me.
When did you last use Mongo? We've been using it in production for 3+ years, and while there were certainly some issues early on, we've had nothing but success with it (especially over the last few major versions).
You could have inconsistencies in your data[1] and not even realize it.

[1] https://aphyr.com/posts/284-call-me-maybe-mongodb

While the call-me-maybe series is definitely informative... it's worth noting that they've called out flaws in every distributed system they've tested against. What it comes down to is, are those flaws fatal in practice. The truth is, it depends.... If you lose a comment on a social media site, no big deal. If you lose part of a transaction for a multi million dollar stock trade, very big deal.

No software system is perfect, but there are definitely practical balances to be made. Especially when you are beyond what a single database/server can offer in terms of write throughput. The fact is, when your traffic needs exceed what a single database can keep up with in terms of writes, you have to give up some level of reliability.

He wasn't able to find issues with Zookeeper and Postgres.

Granted that you can only prove that the system is vulnerable and not the reverse, but if there is a vulnerability it is much harder to trigger it.

In general, strongly consistent distributed datastores like zookeeper tend to be strongly consistent (cf Consul and Etcd too)... But Postgres was not tested as a distributed database, sharded or replicated, and without any form of failover. The difference is: kill a zookeeper node and you will not notice, kill Postgres and your app is dead.

Postgres is a good DB, but since it's not distributed, it's not very useful to compare it to distributed databases. Yes it's consistent, but it's only as reliable as the single node where it is installed.

This is a common misconception about CAP theorem. Significant number of people don't realize that distributed system also includes clients, it's not just communication between servers.

I suspect he did not go over replication, because Postgres technically still fail over support is DIY, although he should. There are two replication methods though which I would like to see:

- asynchronous - this one is fast, but it most likely would have similar issues the other database have - synchronous - the master makes sure data is replicated before returning to the user this should in theory always consistent

You would typically have two nodes in same location replicating synchronously and use asynchronous replication to different data centers. On a failure, you simply fail over to another synchronously replicating server.

Regarding consul/etcd actually those technologies did not do well in his tests, but authors appear to be motivated to fix issues.

95% of the people I hear about using NoSQL databases are using them on a single node.
Came here thinking just the same thing.

Can anyone explain (SQL pun not intended) to me the advantages / disadvantages between rethinkDB and say PostgreSQL?

Here's a comparison between RethinkDB and MongoDB (written by RethinkDB) http://rethinkdb.com/docs/rethinkdb-vs-mongodb/ and the FAQ for "When is RethinkDB not a good choice": http://rethinkdb.com/faq/#when-is-rethinkdb-not-a-good-choic...

A lot of the things would also apply to PostgreSQL

I realize it is popular these days to hate on Mongo, but the FUD is getting a bit old.
really, the MongoDB haters are pathological at this point. 10gen must have really pissed off the HN crowd early on (I missed that cycle) but today, in the real world, mongo is a useful tool for what it does best. It's time to move on from early impressions and angst about their hype fails, and give them a few crumbs of credit for being the defacto leader in a new space, and working through the inevitable rough patches.
I don't think you understand how badly 10gen overmarketed Mongo as something that it isn't. I personally believe the people at 10gen deserve legal repercussions (I expect we'll see this in the coming years), even incarceration for the pure lies they spouted that cost companies so much. MongoDB is a trojan horse, a trap, completely incapable of serving any real-world need. If someone thinks MongoDB is working from them, they just don't realize yet how it's subtly broken.
Sorry have you just woken up and missed the last 20 years ?

Almost every single IT company exaggerates claims, says their product is "the best" and "amazing" and suitable for every use case under the sun. Oracle did it. Microsoft did it. Mongo did it. And thousands of companies in the future will do it in the future. It's called Marketing.

And I think you should speak to all these customers (http://www.mongodb.com/who-uses-mongodb) and tell them they don't serve any real world need. I would imagine a few would ask the same question of you.

I think "ORGANIZATIONS CREATING APPLICATIONS NEVER BEFORE POSSIBLE" about sums it up on their "who uses" page... There's nothing in particular that mongo does better than it's competitors.
Thanks for making my case for me. This is possibly one of the most ridiculous replies ever posted to this forum.

The day we start jailing people for their marketing hype is... well, it wouldn't be a good day.

I'm confused what horrible design flows MongoDB had apart from collection level locking ?

And I can't imagine any of this is relevant today given that MongoDB allows for pluggable storage engines.

The way it handle failures? You may be interested in this (old-ish) article: http://hackingdistributed.com/2013/01/29/mongo-ft/
Another worth read: https://aphyr.com/posts/284-call-me-maybe-mongodb

Although to be fair, it's not just MongoDB that is performing poorly.

How is that a design fault ? It was purely a poorly chosen configuration setting which reflects the fact that Mongo was originally not a general purpose database. And it was never even an issue for 99% of people because all the drivers at the time used the safer settings.

I always find it amusing when people bring these issues up because it's like a giant sticker on their forehead that says "I've never actually seriously spent time with MongoDB before". I always go through the configuration of the database I use to make sure it meets my needs. Only seems sensible.

>It was purely a poorly chosen configuration setting which reflects the fact that Mongo was originally not a general purpose database.

Thats dubious. 1.) When MongoDB was released, none of the drivers used the "safer" settings. 2.) 10gen, at the time, released benchmarks with the "unsafe" settings comparing it to MySQL and boasted that MongoDB was much faster (ignoring the fact that it wasn't acknowledging your writes).

Hmm, are you able to point us to any of these "benchmarks" with unsafe settings?

AFAIK, until recently (i.e. the last month), there weren't any such benchmarks released by MongoDB - and then, only for 3.x.

I'd be very surprised if any such benchmarks exist, as you claim.

Disclaimer - I work for MongoDB Inc.

MongoDB is advertised as general purpose database though and majority of people use it that way.
Sure. And that's why the configuration was changed years ago.
The article actually talks about the configuration change. The issues are still there.
I'm not sure we've read the same article.
Quick -- give an example of something else unrelated to the original article expressing your disgust with it.
I don't really have a strong opinion in the game but mind you that MongoDB is pretty darn fast http://www.peterbe.com/plog/fastestdb At least when you're not at scale.

And it's pretty cool that you can make a choice between fast writes or safe writes. You can't have both but at least you can have the choice.

Having said all that, I generally prefer Postgres in almost every possible case.

However, this RethinkDB project looks sexy and with a great potential.

/dev/null is pretty darn fast, too. https://www.youtube.com/watch?v=b2F-DItXtZs
You know this benchmark is bullshit because it says memcache is slower than redis (at editing) and mongo (at creating and deleting) and motor (at creating).