Hacker News new | ask | show | jobs
by tzury 3176 days ago
Not ranting or trolling, but in the vast majority of cases I've come across, PostgreSQL or even mySQL or SQLite would have been a better choice.

(There must be something appealing to developers using JSON's style syntax rather than a Structured Query Language.)

There should be a solid reason to pick noSQL in general, and when such appear, picking the right one amongst the available noSQL platform is another job.

https://en.wikipedia.org/wiki/NoSQL

7 comments

> Not ranting or trolling, but in the vast majority of cases I've come across, PostgreSQL or even mySQL or SQLite would have been a better choice.

This is ranting.

I am a Postgres proponent but saying that PostgresSQL/mySQL/SQLite is the better choice in the vast majority of cases the parent has come across is reckless. The words were well chosen making the rant not that obvious.

There aren't good or bad DBs. Every DB has its strengths and respective trade-offs. As much I like Postgres, there so many use cases to use also other DBs and also NoSQL ones. I am not feeding the troll and starting reasoning why NoSQL can be terrific or SQL can be a big struggle, I am on both sides, both SQL and NoSQL have their place.

It's sad that a thread which is about learning NoSQL gets hijacked by a unrelated top comment opposing NoSQL.

There definitely are bad databases. You can easily make a system that is NOT consistent and NOT available and NOT partition tolerant, for example.
“Yeah, but Postgres” is the new “Is it webscale?” For any db related thread. They found a blue hammer that will work for every problem and want everyone to know.
Not only that, but Postgres has great JSON support. I think a good way to put it is: If you don't know the ins and outs of SQL, start there. It solves the vast majority of problems you'll encounter. Expand out to NoSQL as your needs (and knowledge) arise.
I know you mentioned Postgres, I was just wondering high if you or anyone else had experience with using JSON with MySQL? I am currently in a place at work where the design was made to use MySQL and we can’t back out of that and I find their documentation a little terse in this subject (I’m not a SQL expert) so I was wondering if anyone could speak to it at all? Does it automatically parse the keys in a JSON file as table names for where to put the values or are you just calling a file every time? Is one per say more effective or efficient than the other?

Sorry to latch on I’m very eager to learn. Our stacks of choice are Django and Flask respectively, if that helps

Right on! This is the reason I've actually mentioned Postgres.
“I want to learn about planes”

“Trains are usually a better choice. Most people don’t need planes”

Can you explain why? I agree, but I’d like to be able to justify it as much as possible when arguing for SQL.
Not the author but here is my explanation: SQL databases are similar to a Swiss army knife. You can apply them pretty much to every use case. However, for most use case they won't be as good as a more specialized tool. NoSQL DBs usually make stronger trade offs that limit them to fewer use cases, but make them incredibly well suited for others. If you know for sure what your problem is and you gotta scale, go NoSQL. If you and your company are starting out you are most likely better off with Postgres. Even if your current use case is a perfect fit for a specific NoSQL store your business needs are likely to change and now you gotta migrate. For all but intense cases Postgres will scale well. Once you are super successful you can migrate the pieces of your system that need to to a better scaling solution. You must make 100% sure though that you understand the tradeoffs that you are making. There is no system that is just in general better than any other reasonable system. If a knew system claims otherwise we just don't know the tradeoffs yet which is super dangerous.
>However, for most use case they won't be as good as a more specialized tool

It's just a small set of problems that really requires a nosql database.

Most (if not all) nosql databases are perceived as less complicated since they hand-wave away all complicated things to the users of the database, while focusing on being fast and simple to use and run in a cloud or cluster.

Anyone running a database system in a fault tolerant configuration immediately hits the CAP theorem, and SQL and nosql databases sacrifies or ignores different aspects of both CAP and ACID in order to scale.

As you write, you really have to know what you are sacrificing before doing that choice. Perceived complexity is probably not a good selector.

One problem is that SQL databases are normally installed in "pet-mode" where you have two or three servers that you really have to take care of. This feels less satisfactory when developing for the cloud, and typically also doesn't scale very well horizontally. Instead of running your own distributed database in the cloud (and fail) there are also PaaS databases, but SQL tends to be flavoured making it hard to change the infrastructure.

Maybe another problem is the model mismatch - relational databases are imposing restrictions on how data is represented, and how it's retrieved that makes no sense from a "rest-interface based" view as there's a mismatch between the relation-entity view (objects and lists) and relational algebra.

There are graph databases, and I personally think that they might be the future. Building strong models within a bounded context is still probably the best way to model complex data and processes that operate on that data.

Unfortunately the future isn't here yet and most graph databases are still slower than my laptop.

The best compromise is probably to use CQRS - Command Query Responsibility Segregation, meaning that queries and commands (modifications) are handled by separate stacks where read-only data might be distributed and updated ("cached") for use, but actual processing is made to a single consistent database running on a few "pet" servers.

This only makes sense for systems that mostly read things, and are updating it's data relatively seldom.

Q: How can I learn about noX?

A: Not trolling, but X is vastly usually better than noX.

IDK what tolling is.

Vast majority of cases I've come across, if not all of them, only suffer from any reliance on 90s era RDBMS systems.

And it's never about JSON, it's about latency and resilience, about being able to simply add and replace nodes, about just working in a modern distributed environment.

How many systems actually need a distributed database though? In my experience its usually resume driven development that makes the choice to go NoSQL.
Dear God, this. 90% of the people reading this (or more) - myself included - are currently working on a system that averages fewer than 100 concurrent users. I’ve worked on big systems, and DBs like Cassandra are great, and absolutely have their place, and that place is likely not your system. Quit over complicating everything, please. Please.
I love this comment, because it's exactly how I feel about when people talk about these systems, designed for big scale.
Anybody who needs more availability than an individual instance can provide.
That's all great until you need to perform a join.
RethinkDB handles joins just fine :)
As does couchbase. :-) Personally I like map-reduces.
So does the multi-model database ArangoDB. https://docs.arangodb.com/3.2/AQL/Examples/Join.html

And some NoSQL databases speak SQL as well - without being relational.

I like the JSON support in PostgreSQL a lot. Very easy to deal with unstructured JSON data while still using common attributes in a relational format. But there are more cases that one might think about - as a relational guy - that benefit from graph databases, document stores or optimized time-series databases.

NoSQL is a great fit for OLAP-type systems where there's tons of high volume writes, eventual consistency (or BASE in general) is good enough, strict schema is not enforced and the consumer - data scientist or a customer service rep etc is not affected too much if they have to wait a few extra seconds for the search results to come back.