Hacker News new | ask | show | jobs
by trashtester 1227 days ago
Some claim the opposite, though. Even in this discussion. That NoSQL is needed when you need to change the schema in a production system.

Anyway, most systems that last more than a couple of years tend to change over time, meaning the schema is likely to need extensions.

1 comments

Sure, changing the schema is easier in NoSQL. But to do it properly, you still need to really understand how the data is being queried and then change the schema in a controlled "migration". For most people these tasks will be a pain. But NoSQL can really shine in terms of $/query if data is being correctly looked after. It will save you from needing 16 cores in multiple clusters and very expensive bills
That's interesting.

A lot of your suggestions are actually exactly the same for SQL based databases. If your schema is not fit for the task at hand, it can slow it down by an order of magnitude, and the process of changing schema is also similar.

Though, i would think, a properly designed SQL database would need such full schema refactoring less often, since adding a few tables within the same structure is easier.

It sound like you're describing use cases with greater data volumes than I usually use SQL for. (Mostly Data Science use cases, where larger datasets typically end up in Spark, or similar)

My experience is that SQL based systems work best up to a few 100 millions of records in the largest tables (a few billion at most), and with transactions per second is less than about 10000. Around those volumes is where SQL start to get really expensive.

And often SQL is used for use cases where number of records per table of less than 10 million and transactions per second in the low hundreds or lower.

But I'm probably biased in the opposite direction that you are. For me, performance usually means efficient joins. Which means that even if I'm leaving RDBMS's behind, I still use SQL where I can (such as in Spark).

Efficient joins in NoSQL are done by persisting the relation in a brand new entity. Thus they only make sense when the query is really and constantly needed, not when one is writing up SQL, researching, creating a custom report, etc.

NoSQL DBs take large volumes of data, little CPU, and almost no RAM

SQL DBs take lower volumes of data, loads of CPU and RAM

Storage is cheap, CPU is expensive, hence NoSQL is cheap. Even when your project is not in the 100 of milions records this can be significant because you then are able to offer a cheaper product than the competition's

RAM depends on your working set, right?

I maintain clusters with 50+ machines with 128gb+ of RAM...

> Efficient joins in NoSQL are done by persisting the relation in a brand new entity. Thus they only make sense when the query is really and constantly needed, not when one is writing up SQL, researching, creating a custom report, etc.

This is also a common approach for SQL based databases when low latency is needed for common queries.

> SQL DBs take lower volumes of data, loads of CPU and RAM

That depends on the schema and the queries being run on them. Large amounts of CPU is really only needed if there are either 1000s of queries per second or large joins. RAM is very useful for hash joins with medium sized tables (up to a few millions of records). It has some utility for caching indexes or tables that need lower latency than the IO can provide, but that's similar to most NoSQL that I'm familiar with.

Most RDBMS's also come with ACID support, which has a significant cost (especially when writing). That has little to do with the SQL language, though. Spark tends to run on SQL without paying those costs. (More about this in the OA)

> Even when your project is not in the 100 of milions records this can be significant because you then are able to offer a cheaper product than the competition's

For small databases (<100 million records in the largest tables), SQL can be quite affordable, especially with pragmatic schema designs. I was working on SQL based systems more than 20 years ago with tables with several billion records on quite tiny hardware (by today's standards). 100 million is nothin by comparison.

For large tables (100 million to 10 billion records), good schema design and well written queries may be needed for SQL to perform, but that's not that different from what you're saying about NoSQL. Still, as you approach the upper end of this range, the compromises in needed with regards to denormalization, slack transaction management, etc may come at the cost of eliminating many of the advantages of traditional RDBMS's (such as consistency enforced by the schema definition through normalization and integrity constraints).

More than 10 billion, and traditional RDBMs start to break down, of course (you may need a cluster, and you may need to use sharding or similar methodologies from NoSQL / Big Data or similar paradigms, even if technically still on a RDBMS).

For small-medium sized databases (<100 million records/table), and especially for the small (<10 million records/table) I find that you often win back some of the (very moderate) extra costs simply from the consistency enforcement features. (ACID compliance for multi-table transactions, data normalization with referential integrity enforced by foreign key constraints, and so on.)

Btw, I'm not saying that other kinds of databases don't have their place, especially for various types of unstructured data, document oriented data or situations with extreme volumes or throughput requirements. 20 years ago RDBMS's were certainly overused. But 5-10 years ago, the pendulum had swung a bit in the other direction, imo.

A lot of devs that left college around 2010 seems to have jumped on NoSQL databases less because of their strengths than because of how they enabled the devs to trivially persist object oriented data structures with a line or two of code, and because they'd never learned how to use traditional RDBMS's properly.

This is fine as long as they don't need the kind of data consistency features provided by relational data models, but when consistency is needed, it tends to cause unnecessary problems. (Especially as a system ages, dev teams and application logic evolve.)

The reason I started my previous message with "That's interesting", is that your approach to NoSQL clearly shows you have a mature approach to NoSQL, as opposed to those who consider it a silver bullet that makes all complexity go away.

Anyway, it seems that a good design, suitable both to the problems at hand and the technology chosen/available is a universal in the field. Usually more important than specific tech choices.

Hard to disagree

I get that many devs are using NoSQL in a careless way that produces horrors. I just don't think that is NoSQL's fault, hence I dislike comments critizicing NoSQL DBs. I am an efficiency (mostly in $ terms) freak, so I really love how NoSQL avoids the pitfalls of SQL (CPU and RAM usage - plus licenses if paid). That makes my take subjective