Hacker News new | ask | show | jobs
by collaborative 1228 days ago
Sure SQL DBs can have all the advantages of NoSQL DBs and then some... but they will never have a lower price tag. And that's because NoSQL DBs use very, very little CPU. All that's needed is giving up reliance on SQL. Design your NoSQL "schema" with this in mind, and you're golden
1 comments

Very little CPU depending on sorting, indexes, etc...

Have a lot of indexes on a collection? Mongo will eat 5-10ms of CPU time per query even with cached query plan stats just to start executing the query. So no different than PG here.

What you're getting at is Joins, but I haven't seen a company that didn't end up doing joins with Mongo at some point. Or, they do it in the app layer, in Java/Node/PHP which requires more CPU than it would in the DB.

Also, wanna store an array in a document in Mongo? Every time you add to that array, the whole array is replicated to your secondaries. That eats a lot of CPU.

But you are describing incorrect ways of using NoSQL. NoSQL requires getting the schema right from the get go. Many people don't like it because they are used to relying on SQL (or a language like you mentioned) to smooth things out. NoSQL requires you to ask yourself hard questions - what specific queries are going to consume this data and then model the data as opposed to in SQL where you just store the data and worry about querying it later. This produces efficient queries that ultimately don't need as much CPU
This is completely opposite to the raison d'ĂȘtre of NoSQL databases. They all started out without support for schemas because there was a desire to get away from the schema-upfront and rigidness of SQL databases. They were created to be able to drop in whatever you want as opposed to SQL databases that has strict rules about what goes in.
Differing structures don't necessarily require more compute to query.

Anyway this is a weird argument. Ideally you get your schema right from the get go no matter the DB, of course things will be easier. But we don't. That's why we have migrations. Also, what's right today, might not be after acquiring Jonny Big Corp.

The whole point of NoSQL is to dump documents into it, of varying structure, and query/index them as-needed.

Some claim the opposite, though. Even in this discussion. That NoSQL is needed when you need to change the schema in a production system.

Anyway, most systems that last more than a couple of years tend to change over time, meaning the schema is likely to need extensions.

Sure, changing the schema is easier in NoSQL. But to do it properly, you still need to really understand how the data is being queried and then change the schema in a controlled "migration". For most people these tasks will be a pain. But NoSQL can really shine in terms of $/query if data is being correctly looked after. It will save you from needing 16 cores in multiple clusters and very expensive bills
That's interesting.

A lot of your suggestions are actually exactly the same for SQL based databases. If your schema is not fit for the task at hand, it can slow it down by an order of magnitude, and the process of changing schema is also similar.

Though, i would think, a properly designed SQL database would need such full schema refactoring less often, since adding a few tables within the same structure is easier.

It sound like you're describing use cases with greater data volumes than I usually use SQL for. (Mostly Data Science use cases, where larger datasets typically end up in Spark, or similar)

My experience is that SQL based systems work best up to a few 100 millions of records in the largest tables (a few billion at most), and with transactions per second is less than about 10000. Around those volumes is where SQL start to get really expensive.

And often SQL is used for use cases where number of records per table of less than 10 million and transactions per second in the low hundreds or lower.

But I'm probably biased in the opposite direction that you are. For me, performance usually means efficient joins. Which means that even if I'm leaving RDBMS's behind, I still use SQL where I can (such as in Spark).

Efficient joins in NoSQL are done by persisting the relation in a brand new entity. Thus they only make sense when the query is really and constantly needed, not when one is writing up SQL, researching, creating a custom report, etc.

NoSQL DBs take large volumes of data, little CPU, and almost no RAM

SQL DBs take lower volumes of data, loads of CPU and RAM

Storage is cheap, CPU is expensive, hence NoSQL is cheap. Even when your project is not in the 100 of milions records this can be significant because you then are able to offer a cheaper product than the competition's

This is true, it's also very difficult to do in most business situations (getting the schema right from the get go).
It's hard but it's still possible to change it. It's just very important to do it carefully and knowing what the consequences are. I have had to iterate NoSQL schemas several times, and I always prefer do treat their evolution as migrations done out-of-hours
Which seems to imply a static schema, known more-or-less perfectly at the outset, and unchanging thereafter. That is obviously unreasonable. Application data structures change as new requirements arise, and as existing requirements become better understood.
> Which seems to imply a static schema, known more-or-less perfectly at the outset, and unchanging thereafter. That is obviously unreasonable.

Is it? Most of AWS runs on NoSQL databases, and continues to ship new features that do not fit into existing schema. This assertion then is clearly incorrect.