Hacker News new | ask | show | jobs
by trashtester 1227 days ago
> Efficient joins in NoSQL are done by persisting the relation in a brand new entity. Thus they only make sense when the query is really and constantly needed, not when one is writing up SQL, researching, creating a custom report, etc.

This is also a common approach for SQL based databases when low latency is needed for common queries.

> SQL DBs take lower volumes of data, loads of CPU and RAM

That depends on the schema and the queries being run on them. Large amounts of CPU is really only needed if there are either 1000s of queries per second or large joins. RAM is very useful for hash joins with medium sized tables (up to a few millions of records). It has some utility for caching indexes or tables that need lower latency than the IO can provide, but that's similar to most NoSQL that I'm familiar with.

Most RDBMS's also come with ACID support, which has a significant cost (especially when writing). That has little to do with the SQL language, though. Spark tends to run on SQL without paying those costs. (More about this in the OA)

> Even when your project is not in the 100 of milions records this can be significant because you then are able to offer a cheaper product than the competition's

For small databases (<100 million records in the largest tables), SQL can be quite affordable, especially with pragmatic schema designs. I was working on SQL based systems more than 20 years ago with tables with several billion records on quite tiny hardware (by today's standards). 100 million is nothin by comparison.

For large tables (100 million to 10 billion records), good schema design and well written queries may be needed for SQL to perform, but that's not that different from what you're saying about NoSQL. Still, as you approach the upper end of this range, the compromises in needed with regards to denormalization, slack transaction management, etc may come at the cost of eliminating many of the advantages of traditional RDBMS's (such as consistency enforced by the schema definition through normalization and integrity constraints).

More than 10 billion, and traditional RDBMs start to break down, of course (you may need a cluster, and you may need to use sharding or similar methodologies from NoSQL / Big Data or similar paradigms, even if technically still on a RDBMS).

For small-medium sized databases (<100 million records/table), and especially for the small (<10 million records/table) I find that you often win back some of the (very moderate) extra costs simply from the consistency enforcement features. (ACID compliance for multi-table transactions, data normalization with referential integrity enforced by foreign key constraints, and so on.)

Btw, I'm not saying that other kinds of databases don't have their place, especially for various types of unstructured data, document oriented data or situations with extreme volumes or throughput requirements. 20 years ago RDBMS's were certainly overused. But 5-10 years ago, the pendulum had swung a bit in the other direction, imo.

A lot of devs that left college around 2010 seems to have jumped on NoSQL databases less because of their strengths than because of how they enabled the devs to trivially persist object oriented data structures with a line or two of code, and because they'd never learned how to use traditional RDBMS's properly.

This is fine as long as they don't need the kind of data consistency features provided by relational data models, but when consistency is needed, it tends to cause unnecessary problems. (Especially as a system ages, dev teams and application logic evolve.)

The reason I started my previous message with "That's interesting", is that your approach to NoSQL clearly shows you have a mature approach to NoSQL, as opposed to those who consider it a silver bullet that makes all complexity go away.

Anyway, it seems that a good design, suitable both to the problems at hand and the technology chosen/available is a universal in the field. Usually more important than specific tech choices.

1 comments

Hard to disagree

I get that many devs are using NoSQL in a careless way that produces horrors. I just don't think that is NoSQL's fault, hence I dislike comments critizicing NoSQL DBs. I am an efficiency (mostly in $ terms) freak, so I really love how NoSQL avoids the pitfalls of SQL (CPU and RAM usage - plus licenses if paid). That makes my take subjective