Hacker News new | ask | show | jobs
by sreque 5754 days ago
My problem is that I'm not a data storage expert. Do I now have to specialize in this field to be able to choose the appropriate persistence technology for a given problem? I have yet to see a good explanation of when different technologies are more appropriate, as most of the discussions I see usually devolve into some sort of SQL-NoSQL flame war. I would like a good, fair resource to explain the pros and cons of different persistence technologies more clearly.
1 comments

The questions you should ask yourself are the following:

* Does it matter if you lose the last 5sec. worth of updates? The last 5 minutes? The last day?

If you can lose 5sec. worth of updates, a MongoDB replication pair is just fine. If you can lose a day's worth of updates (or can easily reconstruct the database contents from other sources), you can try out pretty much anything without bad repercussions. If you can't lose anything, you're pretty much limited to the most conservative databases (the SQL bunch).

* What's the most obvious unit of data that you're working with?

If you always update single values (or add things to lists/sets), Redis is an excellent choice. If you have fixed-size records, SQL or one of the table-based options (Cassandra, Hbase) may be for you. If you have documents with substantial internal structure, a document store (MongoDB, CouchDB, or Lotus Notes if you want something expensive and commercial) would be a good option.

* How much data do you have?

If all of your data fits into memory (and for the price of another server, you may well get enough memory to fit all of your data), you can go pretty far with a single server. If it fits on a single set of hard disks, you'd want replication, not sharding, so that the risk of losing data is minimal. If your data is much larger than that, your only hope is a sharding setup - either with SQL+spit+glue, or Cassandra/HBase, or some version of MongoDB where sharding is stable enough for production use (I do remember seeing warnings - so the current version may or may not fit that description).

Thanks for the response. I will mull that over next time I have a say in the matter of how to store data. I think at some point I will also just need invest some time to play with a few of the NoSQL choices to get a better feel for them.