Hacker News new | ask | show | jobs
by TimJYoung 2743 days ago
Key-value stores are fine if you are always simply grabbing a single row/tuple from a single dataset. Any time you move beyond that, you're in for a world of hurt compared to a relational database that uses SQL.

The reasons are:

1) You often need a transactional, consistent view of the data across multiple datasets (tables). Relational databases using SQL provide that, key-value stores may or may not (most likely not).

2) Any time you're doing any kind of aggregation, key-value stores will require that you pull all data in locally and perform the operation there. It's questionable whether the local environment even has the resources to accomplish such a task, and the efficiency of such operations will be dismal compared to a SQL-based relational database server.

Key-value stores are, from an interface perspective, very much like the older ISAM databases that were popular before SQL and relational databases gained a foothold in the industry. They also recycled the exact same problems that SQL-based relational databases solved so elegantly (round and round we go).

Another way to put it is this: leaving aside scaling for now, SQL-based relational databases can typically do everything that a key-value store can do, whereas key-value stores cannot typically do everything that a SQL-based relational database can do.

1 comments

I wasn't so much asking about relational databases vs key/value stores, but rather why should I implement things like advanced sorting, mathematical formulas, etc in SQL rather than in the application code that fetches data from SQL?
> why should I implement things ... in SQL

It's simply more efficient? If I have a table with millions of rows, I'm not going to bring down the entire data set on every request just to sort by latest and grab the 10 most recently updated rows that haven't been soft deleted (WHERE active=1 ORDER BY updated_at DESC LIMIT 10).

Well, obviously limiting results/rows should be done in SQL.. but that's not what the article is demonstrating..
- Advanced sorting plus pagination comes to mind. How would you do that without pulling unnecessary rows? - GROUP_CONCAT or STUFF in SQL Server already decreases the number of rows. His example has one row returned instead of 3. - Aggregating in SQL would also reduce number of rows returned instead of pulling all the aggregated rows to code and doing it there - I would guess RDBMS implementations are in something faster than the ORM is implemented in and their code is more optimized - that would be speculation.

Also, his examples are like the tip of the iceberg. There are all kinds of things like windowing functions and common table expressions that have so much power to get exactly as much data as you need right at the RDBMS without having to carry that baggage anywhere else.

Still all boils down to the same thing - keeping the computation & data as close to possible will always be more efficient.
Anything that deals with the source data, such as aggregation, sorting, or partitioning, should be done with SQL.

Anything that deals with the formatting/display of the columns in the result set rows can be done in SQL, but doesn't have to be done in SQL. It's really a matter of convenience to do it in SQL, especially when dealing with multiple front-end languages, because it means that the formatting is done in one place.