Hacker News new | ask | show | jobs
by ThinkBeat 970 days ago
This feels like satire.

and written by someone who should have spent more time studying the history of database systems.

> Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself

This is not a good idea. It has been done many times and never quite caught on because it is not a good idea.

From a security perspective it is a nightmare.

Or if you do put all the correct isolation around the code to protect the database, then you have basically created an "app server" (old term) inside a database, and it would happily, run outside of the database since in essence it is already doing so.

5 comments

> never quite caught on

It was quite normal in 70s 80s and even 90s; all the ms sql, db2, oracle and as400 systems I encountered in those days had all or almost all logic as stored procs. Very large ones.

There's also a ton of optimization savings in doing this. your DB already has to move from media to cache, so why not do the pipelined processing in place where it's cheapest?

Instead, we haul that shit to the NIC, then across the network, then copy it into memory on some server (probably inefficiently), do the operations there, or we have to reinvent this with pushdown functions for distributed databases.

There are many cases where moving function-to-data is the right answer.

During the Java era the standard quickly became ORM instead of stored procedures.

Largely thanks to NeXT showing the way with EOF (the original ORM).

ORMs basically exist because people doing coding refuse to learn the relational model and want to program with OO instead. Most of the time they are, at best, irrelevant but eventually become expensive, slow, complex and painful. Everyone in coding knows, once they get a lot of experience, that translation/mapping layers are a waste of cycles, memory, and are complex and where assumptions mismatch results in tons of bugs, and that applies to ORMs in partcular.
I've just spent the past week refactoring code that was too slow because joins were being performed by a loop in the code rather than just doing a join in the first place.

Replacing 3 classes, and numerous methods with two 6 line queries that does the same task in a thousandth of the time is quite satisfying.

That seems due to the leaky abstractions that ORMs invite; they appear to be just 'normal code'; just working with structures, objects and variables like you would normally do in code. In reality you have to really know the insides, performance characteristics etc of the db you are using. We encounter a massive amount of codebases from the last ~10 years that are built like there is no database to account for, so there are usually not even indices, there are indeed loops doing inserts etc.

Stored procs might be considered evil; they did make the developer acutely aware of the innards of the database instead of just not even knowing how a db works and which things are expensive. To know that you need to look past the leaky abstraction of ORMs and that's distracting from making features.

I have heard tech leads in this situation claim that because everything runs fine in the cloud without knowing anything about postgres/mysql/dynamo etc, and those extra costs are cheaper than people, it's fine. But our team wasn't hired because it was fine; we were hired because hosting costs were eclipsing the developer costs. It's not that hard to do when stuff is just incredibly badly built...

They don't even need to be stored procedures, just having proper queries in the code to perform the database actions would be preferable to ORM.

Another big issue I have with ORMs is that a lot of them are hostile to you writing queries in the first place. Network IO is the worst, please stop trying to force us to use your terrible overlay.

> Or if you do put all the correct isolation around the code to protect the database, then you have basically created an "app server" (old term) inside a database, and it would happily, run outside of the database since in essence it is already doing so.

You can drastically cut down on end-to-end latency by eliminating the network hop from your app server to your DB, especially in cases where you are forced to make multiple back-to-back DB queries per request.

It's possible and even relatively straightforward to eliminate that network hop without moving all of the application logic into the database.
"all" is doing heavy lifting here in a way that distorts the observation the person you are replying to was making.
Fair enough. It wasn't my intention, perhaps just being careless.

I think these conversations usually boil down to some kind of pragmatism versus idealism, so let me be frank about that. If everything could be done as close to the data as possible, that would be faster than not. Every boundary / transformation is a few burned cycles and delays waiting for the arrival of photons/electrons. The ideal here, really, is to have everything occur in one process on one machine without hitting the disk. Ideally the User's machine, for whatever value of User.

However, there are other considerations which cause us frequently to draw the boundaries differently. Often just as valid.

Where certain things happen is relatively flexible, it's software and we can come up with whatever architecture we can come up with. If you're moving more logic into the database process, then there are costs to that but they're not measured in cycles. Others if you do it by moving the database into the application process.

Maybe those costs make sense, in which case go for it. We've had the means to do arbitrary work in the database for many decades, and for some periods it was even fashionable.

It's a nightmare from a lot of angles, especially debugging. Mashing everything together is a recipe for disaster. To debug something you want to isolate a small unit of execution and be able to replay the same data through it continually until you get the right results back.
A world where the database is seamlessly embedded in the backend code instead would be more useful. Oh wait…
This line in particular is a head-scratcher:

> Advanced inter-document relations and analysis. No JOINs. No pain. […] queries allow for multi-table, multi-depth document retrieval, efficiently in the database, without the use of complicated JOINs

That sounds like they have a networked database, not a relational database. That might be fine, but as I understand it, relational databases won because they offer more flexible access patterns and it’s easier to write correct queries.