Hacker News new | ask | show | jobs
by muyuu 5341 days ago
My big concern with so-called NoSQL solutions is the "culture" that seems to be brewing there.

If you go to the "Don't use MongoDB" post ( http://news.ycombinator.com/item?id=3202081 ) you will read some, IMO, extremely worrying comments from a few pro-NoSQL users including antirez (Redis).

For some reason NoSQL now apparently means "unreliable datastore for unimportant, throwaway data" and defaults are chosen accordingly. Why the hell is that?

NoSQL for me doesn't imply anything other than "no SQL", and at a stretch "no schema" - this makes a lot of sense for many of us who routinely need to create databases that are logically trivial. In many cases they are a bunch of glorified persistent hash tables that usually don't fit in memory. But this doesn't mean they aren't critical. Why would it have to? This isn't anything new either, we've had Berkeley DB for a long while. It's just a bit of the dry side and it may fall short in many cases.

What I was looking forward to and I hoped I could find in the "NoSQL scene" is an alternative to traditional DBs but without the overhead that many times is not necessary (but sometimes is, and I intend to continue using PostgreSQL when appropriate). Ideally, something as simple as mongoDB appears to be (tried the interactive tutorial).

So when exactly NoSQL stopped meaning "no SQL" and started meaning "unreliable cache"? Other than the simplicity, I fail to see where it would fit in the market then (other than the amateur market). There are better, stablished DB caching solutions. There are persistence libraries in any moderately language. There are reliable databases that are fast enough when you have the budget to scale to several dedicated servers.

How about Riak?

3 comments

NoSQL has never meant "unreliable datastore for unimportant, throwaway data". If it did, there would be no need for the MongoDB rant because that poor level of at-scale reliability would have been understood from the beginning. MongoDB wasn't marketed as an unreliable data store, so expectations weren't met by the rant author.

I'm worried about the culture that's brewing as well, but I see it more as an attempt from some NoSQL supporters to keep MongoDB looking good, even in the face of serious data integrity issues. The battle lines are forming between SQL and NoSQL (relational vs. non-relational data stores, really) and there's a lot of money and reputation at stake. What we don't want is for the facts to die in a war of rhetoric about the merits of SQL vs. NoSQL. That would be dumb.

With that said, the first paragraph of the rant is worrying:

"I've kept quiet for awhile for various political reasons, but I now feel a kind of social responsibility to deter people from banking their business on MongoDB."

What the hell does "various political reasons" mean? I'm more concerned about that than any deficiencies in MongoDB's codebase. Is there a well-funded campaign to silence MongoDB/NoSQL criticism, or is this just one customer's attempt to save face for choosing the wrong data store?

CouchDB is designed to be as durable / reliable as the underlying io abstractions (posix and friends) will allow. This in memory stuff is really just a minority. I believe Cassandra is also reasonably durable as well.
[ First off: I'm a committer on Project Voldemort, a Dynamo-style distributed data store ]

First, Riak is excellent. I can only say positive things about it as well as the folks that work on it.

Re: "store for unimportant data". I'll go beyond that. Not only should new databases be suitable for reliable storage, new databases should do things than existing databases can't. I am a bit sad that NoSQL had become to mean "replacement for an improperly tuned, ad-hoc sharded MySQL setup". To be clear, having a simple setup that provides partitioning, replication and defaults more tuned to modern hardware is a fine goal -- but why not do better? If I wanted something better than MySQL, I'd use Postgres (or properly tune my MySQL installation).

For example, Dynamo-style stores allow for any replica to initiate a write (something not possible with primary copy replication), allowing high availability applications. Some systems (Voldemort, riak-core, HBase with co-processors) also allow custom code to run on the server, significantly extending the capability of a system in a way in which a store procedure can't.

It's also sad to see NoSQL style systems repeat many mistakes that MySQL has made. MySQL in late 90s with MyISAM is a completely different beast from MySQL today with InnoDB: far better concurrency, durability, referential integrity, better replication. BerkeleyDB JE is also a powerful beast: log structured storage (this is why we're using it as the default storage engine in Voldemort), Paxos-based leader elections with tunable replication.

Schema-less data or (as in Voldemort) evolvable schemas is also a huge feature, but it's not impossible to replicate it on top of MySQL (e.g., Friendfeed's data model).

Here are some things that I'd like to really see evolve in NoSQL space:

* Support for new and interesting distribution models. Allowing users to choose between eventual consistency, quorum protocols, primary copy replication and even transactional replication.

* Support for large, unstructured blob data: Riak is going the right way with Luwak, I believe Facebook has been using HBase as a front-end for Haystack -- it would also make a great choice for Haystack's metadata store.

* Most NoSQL systems support transactions within the scope of a single value (or document) via the use of quorums, serializing through a single master, etc... However, it'd be nice if something like MegaStore's Entity Groups (or Tablet Groups in Microsoft Azure Cloud SQL server) were supported.

* Secondary indices, whether internal or external (by shipping a changelog) to the system.

* True multi-datacenter support (local quorums if desired, async replication to the remote site) including across unreliable, high latency WAN links (disclosure: Voldemort supports this -- https://github.com/voldemort/voldemort/wiki/Multi-datacenter... )