| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nullymcnull 5334 days ago

These posts are exceptionally well-timed for me. I'm currently wrangling with one of those problems that is just not solved well with relational databases, or even the flat document store that my company already uses. I've been looking hard at Redis and Mongo, and of late I'm leaning towards Mongo. You know what? Having read these posts and the threads - and having extracted what little in the way of factual datapoints I could from them - I'm pretty sure I'll still be riding into production with Mongo.

Some of you guys who were all aboard the NOSQL UBER ALLES hype train a year or two ago now seem to be swinging back - with scrapes and bruises from some truly harebrained misdeployments, no doubt - to a reactionary 'All NoSQL are doomed to reimplement everything relational' nihilism. Back to shitty OR tools and ugly-ass joins for everyone, damnit! Harumph. I could write a novel just quoting and responding to some of the stupid pronouncements and prescriptions for correctness on these Mongo threads' comments.

Anyways. With regards to this specific post:

Let's rewind a couple of years. I work for a significantly smaller company than our anon raconteur, from the sound of it. At roughly the same time as he adopted Mongo, I was also looking hard at it, to solve some problems where the relational options available to us weren't going to cut the mustard. Damn, did Mongo look cool, fun even. The flexibility of having arbitrary object graphs in it and querying down into subdocument properties with real indexing on them, well, it sets nearly any developer's heart a-flutter, particularly those of us who work on dynamic web stuff a fair bit.

Sadly, I have to be an engineer and pragmatist first, I have to think about much more than what is sexy and comfortable for devs. I've been through my share of 3AM wake-up world-enders, I've learned the hard lessons. I considered variables like basic maintainability by ops people, credibility of the vendor, track record, robust redundancy and availability solutions, how far up shit creek we'd be in a disaster recovery scenario, etc. And after thorough research I decided that, for my much smaller company which can afford to be judiciously bleeding-edge where it makes sense to, Mongo was just not clearing the bar. I sucked it up and used unsexy properly normalized relational database tables, then utilized memory caching and async updates to try and paper over the performance issues inherent in that scheme.

What was anon doing? Charging full steam ahead into the wild unknown with Mongo, on an effort that was apparently important to a userbase of millions at a "high profile" company. That's some mighty responsible stewardship of the company, or even just the IT department's, broader concerns right there. Now, I understand that it totally makes sense to have used Mongo 1.x as a scrappy startup on a greenfield project, no problem. But this guy was in a different situation. At that scale in a BFC, conservatism rules, and it rules for a reason.

I think I am starting to understand why anon is anon.

In any case, we're likely going to roll with Mongo soon. It is indeed maturing, and I'm a lot more comfortable with it on all of my criteria these days. I have possibly read more of the JIRA issues than some of the devs, and they are prioritizing the Right Things - at least for my tastes. By my estimation it is on the right track.

Even having not used it in production yet, I can identify some things people are complaining about here as complete and utter RTFM-fail, misunderstanding of what it is they're deploying and whether what they expect out of it is realistic before they begin. I understand the tradeoffs of Mongo, and in my particular situation they make good sense.

2 comments

linuxhansl 5333 days ago

Dislaimer: One of the HBase committers here.

There is/was a LOT of hype in NoSQL. Hype and very little understanding what NoSQL is about and specifically why/when choosing a NoSQL database makes sense and when it does not.

It is not about SQL vs. not. It is about consistency, availability, and partition tolerance, and which of these you are willing to give up. Surprisingly few people know about the CAP theorem and what it implies.

Generally there two main reasons why you switch to NoSQL (Not Only SQL) databases. 1. You need to scale out (add more storage and query capacity by adding more machines). 2. You do not want to be locked into a relational schema.

There is no magic in NoSQL! To scale out these stores give up exactly those features that would impede scaling out (for example global transactions).

What one has to realize that you give up a lot by letting go of relational databases: Fast ad hoc queries, transactions, consistency, and the entire theory and research behind it. I don't see why relational databases are "unsexy". A good query planner is almost a work of art and it is amazing what they can do. In fact we use them alongside HBase.

Instead of ad hoc queries you either get slow map/reduce type "queries" or you need to plan your queries ahead of time and denormalize the data accordingly at insert time.

You better have very good reasons for the switch.

When we evaluated NoSQL stores a while back (for #1 type problems) I was quite the skeptic. We looked at Riak, Redis, MongoDB, CouchDB, Cassandra, and HBase). Eventually we settled on HBase because needed consistency over availability and we needed more than just a key value store, and we already some Hadoop projects... and I started to drink the cool-aid :)

Personally, I am not a big fan of eventually consistent (but highly available) stores, because it is extremely difficult to reason about the state of the store; and the application layers bears a lot of extra complexity. But your mileage may vary.

HBase of course is new as well, and I needed to start fixing bugs and adding new features that we needed.

As with "Java is better than C++" type discussions, here too, what store to use depends on the use case. As parent points out any hype about anything is a bad thing, because it typically replaces reasons as an instrument of decision making.

(not sure what I was getting at, so I'll just stop here).

einhverfr 5333 days ago

I think one of the reasons that NoSQL databases have been oversold is that a lot of projects don't have people on them who are good at engineering databases. The result is that folks use ORM's badly.

If you are going to use the database just to store data structures form your program, you might as well use NoSQL db's. However, in most cases, you get integration and migration wins by:

1) Placing your engineering effort on the database. Looking at the sort of real world data you are collecting, modelling it well in the database and then presenting an API to the application. The API will either be a relational one (i.e. views) or a procedural one (stored procedures). After a couple of iterations, the schema shouldn't be being fundamentally changed too much though there could be some minor tweaking.

2) Now, with a good API you can build an application on the database using a methodology of your choice. This could be done in an agile way.

Now if integration is not a goal, then sure you can do all the data validation in your application and you can use NoSQL databases. But relational databases are also powerful integration tools in their own right. I can't imagine LedgerSMB, for example, doing well on anything else for this reason alone.