Hacker News new | ask | show | jobs
by lmickh 3188 days ago
I like how it pretty much just glosses over decades of familiarity. If there is anything the last 20 years prove, it is that the majority of developers will stick with what they know over what might be a good fit for the job. It goes even deeper in the SQL world down to the specific database flavor.

From the ops side I actually find RDBMS more difficult to deal with cause the power of relationships is easy to abuse and they are not anti-fragile. Instead of smartly reasoning about the data, it is all to easy to just "JOIN ALL THE THINGS WITH MEGA TEMP TABLES!". I've taken more database outages from bad queries then anything else.

There are bad implementations on both sides. There are reasons to pick both sides over the other given a set a circumstances. At the end of the day, the technical facts don't matter to most people's decision making though.

2 comments

> Instead of smartly reasoning about the data, it is all to easy to just "JOIN ALL THE THINGS WITH MEGA TEMP TABLES!".

This is so common, I'd love for the popular databases to add table flags that prevent it by accident. Letting me configure "this table most not full scan or file sort implicitly" would get rid of half the incident callouts I've been involved in. You could always override it in the query where needed.

At best (if your users aren’t allowed to write SQL), that would change your “it is slow” calls to “it doesn’t work”.

At worst, I fear your users would learn to override it by default, as just one other part of the magic incantation needed to please the SQL gods.

I think it would be better to have the planner send out emails “this query has to use a full scan” or, “this query is on the brink of changing strategy compared to earlier runs”

> that would change your “it is slow” calls to “it doesn’t work”.

But that's exactly what I'm asking for. "It's slow" means that it will work until the breaking point and then wake me up. "It doesn't work" with the right reporting allows me to teach someone about indexing during office hours.

MySQL has a slow query log with all non-indexed queries.
I found it not very useful. Sometimes you will query without indexes. Sometimes it just doesn't matter. Sometimes it's a table with 2 rows and you don't want an index.

In practice I don't want to know when it happens. I want an error to be raised instead so the database doesn't suddenly die.

So the problem is (1) it's too noisy and (2) it doesn't actually disrupt the query.

Simultaneously too much and not enough.

> it is that the majority of developers will stick with what they know over what might be a good fit for the job

Anecdotally, I have seen this shift quite a bit in the last decade. There has been a big move toward "try whatever is new and shiny" and I think a lot of that drove the NoSQL craze.

And don't get me wrong, there are some great NoSQL options if you select the right tool for the right job.

But a lot of people who gravitated to NoSQL did so because they designed poor queries in RDBMS. Guilty myself. Often the quick reaction was "wow, this is faster" instead of "well of course it's faster, I'm not getting X, Y and Z features of a relational database. Do I need those? Did I abuse the RDBMS?" etc.